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Community detection in general stochastic block models: 
fundamental limits and efficient recovery algorithms 
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Abstract 

New phase transition phenomena have recently been discovered for the stochastic 
block model, for the special case of two non-overlapping symmetric communities. This 
gives raise in particular to new algorithmic challenges driven by the thresholds. This 
paper investigates whether a general phenomenon takes place for multiple communities, 
without imposing symmetry. 

In the general stochastic block model SBM(n,p, Q), n vertices are split into k commu¬ 
nities of relative size {p;be[fc]i and vertices in community i and j connect independently 
with probability \Qi,j}i,je[k] ■ This paper investigates the partial and exact recovery of 
communities in the general SBM (in the constant and logarithmic degree regimes), and 
uses the generality of the results to tackle overlapping communities. 

The contributions of the paper are: (i) an explicit characterization of the recovery 
threshold in the general SBM in terms of a new divergence function D + , which generalizes 
the Hellinger and Chernoff divergences, and which provides an operational meaning to 
a divergence function analog to the KL-divergence in the channel coding theorem, (ii) 
the development of an algorithm that recovers the communities all the way down to the 
optimal threshold and runs in quasi-linear time, showing that exact recovery has no 
information-theoretic to computational gap for multiple communities, in contrast to the 
conjectures made for detection with more than 4 communities; note that the algorithm is 
optimal both in terms of achieving the threshold and in having quasi-linear complexity, 
(iii) the development of an efficient algorithm that detects communities in the constant 
degree regime with an explicit accuracy bound that can be made arbitrarily close to 1 
when a prescribed signal-to-noise ratio (defined in term of the spectrum of diag(p)<5) 
tends to infinity. 
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1 Introduction 


Detecting communities (or clusters) in graphs is a fundamental problem in computer science 
and machine learning. This applies to a large variety of complex networks in social sciences 
and biology, as well as to data sets engineered as networks via similarly graphs, where one 
often attempts to get a first impression on the data by trying to identify groups with similar 
behavior. In particular, finding communities allows one to find like-minded people in social 
networks [GN02, NWS], to improve recommendation systems [LSY03, XWZ + 14], to segment 
or classify images [SM97, SHB07], to detect protein complexes [CY06, MPN + 99], to find 
genetically related sub-populations [PSDOO, JTZ04], or to discover new tumor subclasses 
[SPT+01]. 

While a large variety of community detection algorithms have been deployed in the past 
decades, understanding the fundamental limits of community detection and establishing 
rigorous benchmarks for algorithms remains a major challenge. Significant progress has 
recently been made for the stochastic block model, but mainly for the special case of 
two non-overlapping communities. The goal of this paper is to establish the fundamental 
limits of recovering communities in general stochastic block models, with multiple (possibly 
overlapping) communities. We first provide some motivations behind these questions. 

Probabilistic network models can be used to model real networks [NewlO], to study the 
average-case complexity of NP-hard problems on graphs (such as min-bisection or max-cut 
[DF89, BCLS87, CK99, BS04]), or to set benchmarks for clustering algorithms with well 
defined ground truth. In particular, the latter holds irrespective of how exactly the model fits 
the data sets, and is a crucial aspect in community detection as a vast majority of algorithms 
are based on heuristics and no ground truth is typically available in applications. This is 
in particular a well known challenge for Big Data problems where one cannot manually 
determine the quality of the clusters [RDRI+14]. 

Evaluating the performance of algorithms on models is, however, non-trivial. In some 
regimes, most reasonable algorithms may succeed, while in others, algorithms may be doomed 
to fail due to computational barriers. Thus, an important question is to characterize the 
regimes where the clustering tasks can be solved efficiently or information-theoretically. In 
particular, models may benefit from asymptotic phase transition phenomena, which, in 
addition to being mathematical interesting, allow location of the bottleneck regimes to 
benchmark algorithms. Such phenomena are commonly used in coding theory (with the 
channel capacity [Sha48]), or in constraint satisfaction problems (with the SAT thresholds, 
see [ANP05] and references therein). 

Recently, similar phenomena have been identified for the stochastic block model (SBM), 
one of the most popular network models exhibiting community structures [HLL83, WBB76, 
FMW85, WW87, BC09, KN11], The model 1 was first proposed in the 80s [HLL83] and 
received significant attention in the mathematics and computer science literature [BCLS87, 
DF89, Bop87, JS98, CK99, CI01], as well as in the statistics and machine learning literature 
[SN97, BC09, RCY11, CWA12], The SBM puts a distribution on n-vertices graphs with 
a hidden (or planted) partition of the nodes into k communities. Denoting by pi, i S [fc], 
the relative size of each community, and assuming that a pair of nodes in communities i 


1 See Section 5 for further references. 
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and j connects independently with probability Qij, the SBM can be defined by the triplet 
( n,p , Q), where p is a probability vector of dimension k and Q a k x k symmetric matrix 
with entries in [0,1]. 

The SBM recently came back at the center of the attention at both the practical level, 
due to extensions allowing overlapping communities [ABFX08] that have proved to fit 
well real data sets in massive networks [GB13], and at the theoretical level due to new 
phase transition phenomena discovered for the two-community case [ColO, DKMZ11, Masl4, 
MNS14, ABH14, MNSb], To discuss these phenomena, we need to first introduce the figure 
of merits (formal definitions are in Section 2): 

• Weak recovery (also called detection). This only requires the algorithm to output a 
partition of the nodes which is positively correlated with the true partition (whp 2 ). 
Note that weak recovery is relevant in the fully symmetric case where all nodes have 
identical average degree, 3 since otherwise weak recovery can be trivially solved. If the 
model is perfectly symmetric, like the SBM with two equally-sized clusters having the 
same connectivity parameters, then weak recovery is non-trivial. Full symmetry may 
not be representative of reality, but it sets analytical and algorithmic challenges. The 
weak-recovery threshold for two symmetric communities was achieved efficiently in 
[Masl4, MNS14], settling a conjecture established in [DKMZ11], The case with more 
than two communities remains open. 

• Partial recovery. One may ask for the finer question of how much can be recovered 
about the communities. For a given set of parameters of the block model, finding the 
proportion of nodes (as a function of p and Q ) that can be correctly recovered (whp) 
is an open problem. Obtaining a closed form formula for this question is unlikely, 
even in the symmetric case with two communities. Partial results were obtained in 
[MNSa] for two-symmetric communities, but the general problem remains open even 
for determining scaling laws. One may also consider the special case of partial recovery 
where only an o(n) fraction of nodes is allowed to be mis-classified (whp), called almost 
exact recovery or weak consistency, but no sharp phase transition is to be expected for 
this requirement. 

• Exact recovery (also called recovery or strong consistency.) Finally, one may ask 
for the regimes for which an algorithm can recover the entire clusters (whp). This 
is non-trivial for both symmetric and asymmetric parameters. One can also study 
“partial-exact-recovery,” namely, which communities can be exactly recovered. While 
exact recovery has been the main focus in the literature for the past decades (see table 
in Section 5), the phase transition for exact recovery was only obtained last year for 
the case of two symmetric communities [ABH14, MNSb]. The case with more than 
two communities remains open. 

2 whp means with high probablity, i.e., with probability 1 — o„( 1) when the number of nodes in the graph 
diverges. 

3 At least for the case for communities having linear size. One may otherwise define stronger notions of 
weak recovery that apply to non-symmetric cases. 
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This paper addresses items 2 and 3 for the general stochastic block model. Note that the 
above questions naturally require studying different regimes for the parameters. Weak 
recovery requires the edge probabilities to be fl(l/n), in order to have many vertices in all 
but one community to be non-isolated (i.e., a giant component in the symmetric case), and 
recovery requires the edge probabilities to be fl(ln(n)/n), in order to have all vertices in all 
but one community to be non-isolated (i.e., a connected graph in the symmetric case). The 
difficulty is to understand how much more is needed in order to weakly or exactly recover 
the communities. In particular, giants and connectivity have phase transition, and similar 
phenomena may be expected for weak and exact recovery. 

Note that these regimes are not only rich mathematically, but are also relevant for 
applications, as a vast collection of real networks ranging from social (Linkedln, MSN), 
collaborative (movies, arXiv), or biological (yeast) networks and more were shown to be 
sparse [LLDM08, StrOl]. Note however that the average degree is typically not small in real 
networks, and it seems hard to distinguish between a large constant or a slowly growing 
function. Both regimes are of interest to us. 

Finally, there is an important distinction to be made between the information-theoretic 
thresholds, which do not put constraints on the algorithm’s complexity, and the computa¬ 
tional thresholds, which require polynomial-time algorithms. In the case of two symmetric 
communities, the information-theoretic and computational thresholds were proved to be 
the same for weak recovery [Masl4, MNS14] and exact recovery [ABH14, MNSb]. A gap 
is conjectured to take place for weak recovery for more than 4 communities [MNS12]. No 
conjectures were made for exact recovery for multiple communities. 

This paper focuses on partial and exact recovery (items 2 and 3) for the general stochastic 
block model with linear size communities, and uses the generality of the results to address 
overlapping communities (see Section 4). Recall that for the case of two communities, if 


q in = a\n(n)/n, 
q ou t = b\n(n)/n, 

are respectively the intra- and extra-cluster probabilities, with a > b > 0, then exact recovery 
is possible if and only if 


\fa — Vb > V2, (1) 

and this is efficiently achievable. However, there is currently no general insight regarding 
equation (1), as it emerges from estimating a tail event for Binomial random variable specific 
to the case of two-symmetric communities. Moreover, no results are known to prove partial 
recovery bounds for more than two communities (recent progress where made in [CRV15]). 
This represents a limitation of the current techniques, and an impediment to progress 
towards more realistic network models that may have overlapping communities, and for 
which analytical results are currently unknown. 4 We next present our effort towards such a 
general treatment. 


4 Different models than the SBM allowing for overlapping communities have been studied for example in 


[SA11], 


3 



2 Results 


The main advances of this paper are: 

(i) an (Sphere-comparison) algorithm that detect communities in the constant-degree 
general SBM with an explicit accuracy guarantee, such that when a prescribed signal- 
to-noise ratio — defined in terms of the ratio |A m i n | 2 /A max where A m i n and A max are 
respectively the smallest 5 and largest eigenvalue of diag (p)Q — tends to infinity, the 
accuracy tends to 1 and the algorithm complexity becomes quasi-linear, i.e., o(n 1+£ ), 
for all e > 0, 

(ii) an explicit characterization of the recovery threshold in the general SBM in terms of a 
divergence function D+, which provides a new operational meaning to a divergence 
analog to the KL-divergence in the channel coding theorem (see Section 2.3), and 
which allows determining which communities can be recovered by solving a packing 
problem in the appropriate embedding, 

(iii) a quasi-linear time algorithm (Degree-profiling) that solves exact recovery when¬ 
ever it is information-theoretically solvable 6 , showing in particular that there is no 
information-theoretic to computational gap for exact recovery with multiple communi¬ 
ties, in contrast to the conjectures made for weak recovery. Note that the algorithm 
replicates statistically the performance of maximum-likelihood (which is NP-hard 
in the worst-case) with an optimal (i.e., quasi-linear) complexity. In particular, it 
improves significantly on the SDPs developed for two communities (see Section 5) both 
in terms of generality and complexity. 

2.1 Definitions and terminologies 

The general stochastic block model, SBM(n,p, Q), is a random graph ensemble defined as 
follows: 

• n is the number of vertices in the graph, V = [n] denotes the vertex set. 

• Each vertex v E V is assigned independently a hidden (or planted) label a v in [k] under 
a probability distribution p = (pi,... ,p *,) on [A;]. That is, P{cr„ = *} = Pi, i E [k]. We 
also define P = diag(p). 

• Each (unordered) pair of nodes (u,v) E V x V is connected independently with 
probability Qa U!(Tv , where Qa u ,a v is specified by a symmetric k x k matrix Q with 
entries in [0,1]. 

The above gives a distribution on n-vertices graphs. Note that G ~ SBM(n,p,Q ) denotes 
a random graph drawn under this model, without the hidden (or planted) clusters (i.e., the 
labels a v ) revealed. The goal is to recover these labels by observing only the graph. 

5 The smallest eigenvalue diag(p)Q is the one with least magnitude. 

6 Assuming that the entries of Qij are non-zero — see Remark 1 for zero entries. 


4 



This paper focuses on p independent of n (the communities have linear size), Q dependent 
on n such that the average node degrees are either constant or logarithmically growing and 
k fixed. These assumptions on p and k could be relaxed, for example to slowly growing 
k , but we leave this for future work. As discussed in the introduction, the above regimes 
for Q are both motivated by applications and by the fact that interesting mathematical 
phenomena take place in these regimes. For convenience, we attribute specific notations for 
the model in these regimes: 

Definition 1. For a symmetric matrix Q 6 IR^_ xfc , 

• Gi (n,p,Q) denotes SBM(n,p,Q/n), 

• G 2 (n,p,Q) denotes SBM(n,p,ln(n)Q/n). 

We now discuss the recovery requirements. 

Definition 2. (Partial recovery.) An algorithm recovers or detects communities in SBM(n,p, Q) 
with an accuracy of a £ [0,1], if it outputs a labelling of the nodes {a'(v),v G V}, which 
agrees with the true labelling a on a fraction a of the nodes with probability 1 — o n (l). The 
agreement is maximized over relabellings of the communities. 

Definition 3. (Exact recovery.) Exact recovery is solvable in SBM(n,p, Q) for a community 
partition [k] = U* =1 A S , where A s is a subset of [k], if there exists an algorithm that takes 
G ~ SBM(n,p, Q ) and assigns to each node in G an element of {A \,..., A t } that contains 
its true community 1 with probability 1 — o n (l). Exact recovery is solvable in SBM{n,p,Q) if 
it is solvable for the partition of [k] into k singletons, i.e., all communities can be recovered. 
The problem is solvable information-theoretically if there exists an algorithm that solves it, 
and efficiently if the algorithm runs in polynomial-time in n. 

Note that exact recovery for the partition [k] = {z} U ([£;] \ {z}) is equivalent to extracting 
community i. In general, recovering a partition [k] = U* =1 A S is equivalent to merging the 
communities that are in a common subset A s and recovering the merged communities. Note 
also that exact recovery in SBM(n,p, Q) requires the graph not to have vertices of degree 0 
in multiple communities (with high probability). In the symmetric case, this amounts to 
asking for connectivity. Therefore, for exact recovery, we will focus below on Q scaling like 
where Q is a fixed matrix, i.e., on the G 2 (n,p,Q) model. 

2.2 Main results 

We next present our main results and algorithms for partial and exact recovery in the general 
SBM. We present slightly simplified versions in this section, and provide full statements in 
Sections 7 and 8. 

The CH-embedding and exact recovery. We explain first how to identify the communi¬ 
ties that can be extracted from a graph drawn under G 2 (n,p, Q). Define first the community 
profile of community i 6 [k] by the vector 

0i := (PQ)i € 4, (2) 

1 This is again up to relabellings of the communities. 
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i.e., the i-th column of the matrix diag (p)Q. Note that ||(/||i log(n) gives the average degree 
of a node in community i. Two communities having the same community profile cannot be 
distinguished, in that the random graph distribution is invariant under any permutation 
of the nodes in these communities. Intuitively, one would expect that the further “apart” 
the community profiles are, the easier it should be to distinguish the communities. The 
challenge is to quantify what “apart” means, and whether there exists a proper distance 
notion to rely on. We found that the following function gives the appropriate notion, 

D+ : IR^ X R k + -> IR+ 

( 0i,6j ) D+(0j, 9j) = max (tdj(x) + (1 - t)9j(x ) - 6j(x) t 9j(x) 1 ~ t ) . (3) 

te[0,1] xe[fc] 

For a fixed t, the above is a so-called /-divergence (obtained for f(x) = 1 — t + tx — 
x t ), a family of divergences generalizing the KL-divergence (relative entropy) defined in 
[Csi63, Mor63, AS66] and used in information theory and statistics. As explained in Section 
2.3, D+ can be viewed as a generalization of the Hellinger divergence (obtained for t = 1/2) 
and the Chernoff divergence. We therefore call D+ the Chernoff-Hellinger (CH) divergence. 
Note that for the case of two symmetric communities, D + (9\, 62 ) = \{y/a — Vb) 2 , recovering 
the result in [ABH14, MNSb], 

To determine which communities can be recovered, partition the community profiles into 
the largest collection of disjoint subsets such that the CH-divergence among these subsets 
is at least 1 (where the FF-divergence between two subsets of profiles is the minimum of 
the FT-divergence between any two profiles in each subset). We refer to this as the finest 
partition of the communities. Figure 1 illustrates this partition. The theorem below shows 
that this is indeed the most granular partition that can be recovered about the communities, 
in particular, it characterizes the information-theoretic and computational threshold for 
exact recovery. 


D+( 6 1,0 2 ) > 1 



Figure 1: Finest partition: To determine which communities can be recovered in the SBM 
G 2 (n,p,Q), embed each community with its community profile 9i = ( PQ)i in [R(/ and find 
the partition of 9±,... ,9^ into the largest number of subsets that are at CH-divergence at 
least 1 from each other. 
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Theorem 1. (See Theorem 6) 

• Exact recovery is information-theoretically solvable in the stochastic block model 
G 2 (n,p,Q) for a partition [fc] = U* =1 A S if and only if for all i and j in different 
subsets of the partition, 

D+^PQ^PQ),)^ 1, (4) 

In particular, exact recovery is information-theoretically solvable in G 2 (n,p,Q) if and 
only if mm ide[k]ti7Lj D + ((PQ)i\\(PQ)j) > 1. 

• The Degree-profiling algorithm (see Section 3.2) recovers the finest partition with 
probability 1 — o n (1) and runs in o(n 1+<E ) time for all e > 0. In particular, exact 
recovery is efficiently solvable whenever it is information-theoretically solvable. 

This theorem assumes that the entries of Q are non-zero, see Remark 1 for zero entries. 
To achieve this result we rely on a two step procedure. First an algorithm is developed to 
recover all but a vanishing fraction of nodes — this is the main focus of our partial recovery 
result next discussed — and then a procedure is used to “clean up” the leftover graphs using 
the node degrees of the preliminary classification. This turns out to be much more efficient 
than aiming for an algorithm that directly achieves exact recovery. This strategy was already 
used in [ABH14] for the two-community case, and appeared also in earlier works such as 
[DF89, AK94]. The problem is much more involved here as no algorithm is known to ensure 
partial recovery in the general SBM, and as classifying the nodes based on their degrees 
requires solving a general hypothesis testing problem for the degree-profiles in the SBM 
(rather than evaluating tail events of Binomial distributions). The latter part reveals the CH- 
divergence as the threshold for exact recovery. We next present our result for partial recovery. 


Partial recovery. We obtain an algorithm that recovers the communities with an accuracy 
bound that tends to 1 when the average degree of the nodes gets large, and which runs in 
quasi-linear time. 

Theorem 2 (See Theorem 4). Given any k G Z, p 6 (0, l) k with |p| = 1, and symmetric 
matrix Q with no two rows equal, let X be the largest eigenvalue of PQ, and X' be the 
eigenvalue of PQ with the smallest nonzero magnitude. If the following signal-to-noise ratio 
(SNR) p satisfies 


X ’ 
A 7 < (A') 8 , 

4A 3 < (A') 4 , 


(5) 

( 6 ) 
(7) 


then for some e = e(A, X') and C = C(p,Q) > 0, the algorithm Sphere-comparison (see 
Section 3.1) detects with high probability communities in graphs drawn from Gi (n,p,Q) with 
accuracy 


1 - 


Cp 

Ike isfc 


1 — e 


Cp 
16 k 




( 8 ) 
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provided that the above is larger than 1 — > an< ^ runs in 0(n l+t ) time. Moreover, e 

can be made arbitrarily small with 81n(A\/2/|A / |)/ln(A), and C(p,aQ) is independent of a. 


We next detail what previous theorem gives in the case of k symmetric clusters. 

Corollary 1. Consider the k-block symmetric case. In other words, Pi = l for all i, and 
Qi.j is a if i = j and f3 otherwise. The vector whose entries are all Is is an eigenvector of 
PQ with eigenvalue 1 and every vector whose entries add up to 0 is an eigenvector 

of PQ with eigenvalue . So, A = a+ ^' k k ~ 1 ' >/3 and A' = and 


p > 4 


(a ~ b ) 2 

k(a + (k — 1)6) 


( 9 ) 


which is the signal-to-noise ratio appearing in the conjectures on the detection threshold 
for multiple blocks [DKMZ11, MNS12J. Then, as long as k(a + (k — 1 )/3) 7 < (a — f3) s and 
Ak(a + (k — l)/3) 3 < (a — /3) , there exist a constant c > 0 (see Corollary 4 for details on c) 
such that Sphere-comparison detects communities, and the accuracy is 

l _ Q( e -c{a-P) 2 /{k(a+{k-l)P)\-j 


for sufficiently large ((a — f3) 2 /(k(a + ( k — l)/3))). 

The following is an important consequence of previous theorem, as it shows that 
Sphere-comparison achieves almost exact recovery when the entires of Q are amplified. 

Corollary 2. For any k G Z, p e (0,1)“ with \p\ = 1, and symmetric matrix Q with no two 
rows equal, there exist e(5) = 0(1/ \n(6)) and constant c\ > 0 such that for all sufficiently 
large 6 there exists an algorithm ('Sphere-comparison) that detects communities in graphs 
drawn from G(p,SQ,n) with accuracy at least 1 — Os(e~ ClS ) in O n (n 1+e time for all 
sufficiently large n. 


2.3 Information theoretic interpretation of the results 

We give in this section an interpretation of Theorem 6 related to Shannon’s channel coding 
theorem in information theory. At a high level, clustering the SBM is similar to reliably 
decoding a codeword on a channel which is non-conventional in information theory. The 
channel inputs are the nodes’ community assignments and the channel outputs are the 
network edges. We next show that this analogy is more than just high-level: reliable 
communication on this channel is equivalent to exact recovery, and Theorem 6 shows that 
the “clustering capacity” is obtained from the CH-divergence of channel-kernel PQ, which 
is an /-divergence like the KL-divergence governing the communication capacity. 

Consider the problem of transmitting a string of n k- ary information bits on a memoryless 
channel. Namely, let X \,..., X n be i.i.d. from a distribution p on [A;], the input distribution, 
and assume that we want to transmit those k- ary bits on a memoryless channel, whose 
one-time probability transition is W. This requires using a code, which embeds 8 the 


This embedding is injective. 







vector X n = (X\, ..., X n ) into a larger dimension vector U N = (U\,, Un), the codeword 
(N > to), such that the corrupted version of U N that the memoryless channel produces, 
say Y N , still allows recovery of the original U N (hence X n ) with high probability on the 
channel corruptions. In other words, a code design provides the map C from X n to U N (see 
Figure 2a), and a decoding map that allows recovery of X n from Y N with a vanishing error 
probability (i.e., reliable communication). 

Of course, if n = IV, the encoder C is just a one-to-one map, and there is no hope of 
defeating the corruptions of the channel W, unless this one is deterministic to start with. 
The purpose of the channel coding theorem is to quantify the best tradeoffs between to, N 
and the amount of randomness in W, for which one can reliably communicate. When the 
channel is fixed and memoryless, N can grow linearly with to, and defining the code rate 
by R = n/N, Shannon’s coding theorem tells us that R is achievable (i.e., there exists an 
encoder and decoder that allow for reliable communication at that rate) if and only if 

R < maxI(p,W), (10) 

p 

where I(p,W) is the mutual information of the channel W for the input distribution p, 
defined as 

I(p,W) = D(poW\\p x P W) = J2p(x)W(y\x)log (ii) 

P{x)l^ u p{u)W{y\u) 

Note that the channel capacity ma x p I(p, W) is expressed in terms of the the Kullback-Leibler 
divergence (relative entropy) between the joint and product distribution of the channel. 



Y\ 


Y n 



Y\ 


Y n 


(a) An encoder C for data transmission. (b) The SBM encoder for network modelling. 

Figure 2: Clustering over the SBM can be related to channel coding over a discrete memoryless 
channel, for a different type of encoder and one-time channel. 


We now explain how this relates to our Theorem 6. Clustering the SBM can be cast as 
a decoding problem on a channel similar to the above. The to k -ary information bits X n 
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represent the community assignments to the n nodes in the network. As for channel coding, 
these are assumed to be i.i.d. under some prior distribution p on [k\. However, clustering has 
several important distinctions with coding. First of all, we do not have degree of freedom 
on the encoder C. The encoder is part of the model, and in the SBM C takes all possible 
( 2 ) pair of information bits. In other words, the SBM corresponds to a specific encoder 
which has only degree 2 on the check-nodes (the squared nodes in Figure 2b) and for which 
N = ( 2 ). Next, as in channel coding, the SBM assumes that the codeword is corrupted from 
a memoryless channel, which takes the two selected k-ary bits and maps them to an edge 
variable (presence or absence of edge) with a channel W defined by the connectivity matrix: 


W(l\x 1 ,x 2 ) = q x i,x 2 , 
W(0\xi,x 2 ) = 1 - q x !,x 2 , 


( 12 ) 

(13) 


where q scales with n here. Hence, the SBM can be viewed as a specific encoder on a 
memoryless channel defined by the connectivity matrix q. We removed half of the degrees of 
freedom from channel coding (i.e., the encoder and p are fixed), but the goal of clustering is 
otherwise similar to channel coding: design a decoding map that recovers the information 
k -ary bits X n from the network Y N with a vanishing error probability. In particular, exact 
recovery is equivalent to reliable communication. 

A naive guess would be that some mutual information derived from the input distribution 

p and the channel induced from q could give the fundamental tradeoffs, as for channel coding. 

However, this is where the difference between coding and clustering is important. An encoder 

that achieves capacity in the traditional setting is typically “well spread,” for example, like 

a random code which picks each edge in the bipartite graph of Figure 2a with probability 

one half. The SBM encoder, instead, is structured in a very special way, which may not be 

well suited for communication purposes 9 . This makes of course sense as the formation of a 

real network should have nothing to do with the design of an engineering system. Note also 

that the code rate in the SBM channel is fixed to R = AA ~ which means that there is 

(2) n ’ 

hope to still decode such a “poor” code, even on a very noisy channel. 

Theorem 6 shows that indeed a similar phenomenon to channel coding takes place for 
clustering. Namely, there exists a notion of “capacity,” governed not by KL-divergence but 
the CH-divergence introduced in Section 2.2. The resulting capacity captures if reliable 
communication is possible or not. The relevant regime is for q that scales like In (n)Q/n, and 
the theorem says that it is possible to decode the inputs (i.e., to recover the communities) if 
and only if 


where 


1 < J(p,Q), 


J(p,Q) = min D + ((pQ)i, ( pQ)j) 


(14) 


(15) 


Note again the difference with the channel coding theorem: here we cannot optimize over p 
(since the community sizes are not a design parameter), and the rate R is fixed. One could 


9 It corresponds for example to a 2-right degree LDGM code in the case of the symmetric two-community 
SBM, a code typically not used for communication purposes. 
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change the latter requirement, defining a model where the information about the edges is 
only revealed at a given rate, in which case the analogy with Shannon’s theorem can be 
made even stronger (see for example [ABBS 14a].) 

The conclusiong is that we can characterize the fundamental limit for clustering, with a 
sharp transition governed by a measure of the channel “noisiness,” that is related to the 
KL-divergence used in the channel coding theorem. This is due to the hypothesis testing 
procedures underneath both frameworks (see Section 8.2). Defining 

AO, 0 : = (0( x ) + (1 - t)u(x) - //(x)V(a;) 1- *) (16) 

xe[fc] 


we have that 

• D t is an /-divergence, that is, it can be expressed as Yl x OO/OOVKO), where 
f(x) = 1 — t + tx — x t , which is convex. The family of /-divergences were defined 
in [Csi63, Mor63, AS66] and shown to have various common properties when / is 
convex. Note that the KL-divergence is also an /-divergence for the convex function 
f{x) = x ln(x), 

• A/20,0 = !ll\0 — \A'lli i s the Hellinger divergence (or distance), in particular, this 
is the maximizer for the case of two symmetric communities, recovering the expression 
|(y / a— Vb) 2 obtained in [ABH14, MNSb], 

• AO, v) = tjl — (1 — t)u — e~ Dt ^ v \ where Dt(- 1|-) is the Renyi divergence, and the 
maximization over t of this divergence is the Chernoff divergence. 

As a result, D + can be viewed as a generalization of the Hellinger and Chernoff divergences. 
We hence call it the Chernoff-Hellinger (CH) divergence. Theorem 6 gives hence an opera¬ 
tional meaning to D + with the community recovery problem. It further shows that the limit 
can be efficiently achieved. 

3 Proof Techniques and Algorithms 

3.1 Partial recovery and the Sphere-comparison algorithm 

The first key observation used to classify graphs’ vertices is that if v is a vertex in a graph 
drawn from Gi (n,p, Q ) then for all small r the expected number of vertices in community i 
that are r edges away from v is approximately e* • ( PQ) r e av . So, we define: 

Definition 4. For any vertex v, let N r (v) be the set of all vertices with shortest path to v 
of length r. If there are multiple graphs that v could be considered a vertex in, let N r ^(v) 
be the set of all vertices with shortest paths in G to v of length r. 

We also refer to the vector with z-tli entry equal to the number of vertices in N r (v ) that 
are in community i as N r (v). One could determine e CTv given ( PQ) r e av , but using N r (v) 
to approximate that would require knowing how many of the vertices in N r (v) are in each 
community. So, we attempt to get information relating to how many vertices in N r (y ) are 
in each community by checking how it connects to N r t(v') for some vertex v' and integer 
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r'. The obvious way to do this would be to compute the cardinality of their intersection. 
Unfortunately, whether a given vertex in community i is in N r (y ) is not independent of 
whether it is in N r i(v'), which causes the cardinality of |-/V r (u) n N r /(v') | to differ from what 
one would expect badly enough to disrupt plans to use it for approximations. 

In order to get around this, we randomly assign every edge in G to a set E with probability 
c. We hence define the following. 

Definition 5. For any vertices v,v' G G, r,r' G Z, and subset of G’s edges E, let N r yt E (v- 
v') be the number of pairs of vertices (v\,V 2 ) such that v\ G N r \ g\e]{ v )> v 2 G Nr'\G\E] ( v ')> 
and (ui, V2) G E. 

Note that E and G\E are disjoint; however, G is sparse enough that even if they were 
generated independently a given pair of vertices would have an edge between them in 
both with probability 0{\). So, E is approximately independent of G\E. Thus, for any 
v\ G N r \c/ E]( v ) and v 2 G N r '[G/E]{ v ')i ( v h v 2) G E with a probability of approximately 
cQa vl ,a V2 /n. As a result, 


N r ,r'[E\(v ■ v') « N r[G \E\(v) ■ —N r , [G \ E] (v') 

« ((1 - c)PQ) r e av • ^((1 - c)PQ) r 'e a , 
n v 

= c{l-c) r+r 'e av -Q{PQ) r+r 'e av ,/n 


Let Ai,..., A h be the distinct eigenvalues of PQ, ordered so that |Ai| > |A 2 1 > ••• > \Xh\ > 
0. Also define i] so that rj = h if A^ / 0 and i) = h — 1 if A^ = 0. If W % is the eigenspace of 
PQ corresponding to the eigenvalue A,;, and Pw t is the projection operator on to \\f , then 


N ry [E\(v • v’) « c( 1 - c) r+r ’e av ■ Q{PQY +r 'e av ,/n 
_ c( 1 - c) r+r ' 


'EPwiMJ • Q{PQ) r+r ' 

c{1 ~f +l E • Q{PQ) r+r 'PwM^) 

hi 

c(1 t c) ' + ' E jwo - p ^^i) r+r ' +lp wM-,') 

hi 

E A U' +lj V.(e„) • P-'Pw.Uv,,) 


n 


where the final equality holds because for all i Y j, 


\Pwi(e av ) • P Pwj (e <? v ,) = ( PQPwi (e CT J) • P Pwj (e^,) 


= Pwi(e<r v ) ■ QPwj{e-o v ,) 

= Pwi(e<r v ) ■ P~ l \jPwj{.ec v ), 
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and since A* / A j, this implies that Pw t (e CT „,) ■ P _1 P wM <v) = °' 

That implies that one can approximately solve for Pw,P 1 Pw i ^a , given N r y + j(v-v') 
for all (1 < j < fj. Of course, this requires r and r' to be large enough such that 


— „ c) ' + ' ^ +r ' +1 JWO • P-'Pw.^,,) 


is large relative to the error terms for all i < rj. At a minimum, that requires that 
|(1 — c)Aj| r+r,+1 = u(n), so 


r + r' > log(n)/ log((l - c) | A^|). 


On the flip side, one also needs 


r,r' < log(n)/log((l — c)Ai) 


because otherwise the graph will start running out of vertices before one gets r steps away 
from v or r' steps away from v'. 

Furthermore, for any v and v', 


0 < PwMav ~ e <v) • P l Pwi(ea v ~ e<j v ,) 

= Pw,e<j v ■ P^Pwfry - 2P Wi ev v • P^'Pw,^a v , + Pwfia v , 


■ P 1 Pw, e o 


with equality for all i if and only if a v = a v >, so sufficiently good approximations of 
Pw,aa v ■ P~'P\v,eo v , Pw i e (Tv • P 1 P\v i Ga v , and P\y.e a i v • P 1 Pw i e-a v , can be used to determine 
which pairs of vertices are in the same community as follows. 


The Vertex_comparison_algorithm. The inputs are (y, v', r, r 1 , E, x, c), where v,v' are 
two vertices, r, r’ are positive integers, E is a subset of G's edges, x is a positive real number, 
and c is a real number between 0 and 1. 

The algorithm outputs a decision on whether v and v' are in the same community or 
not. It proceeds as follows. 

(1) Solve the systems of equations: 

X]((l - c)\i) r+r ' +3+l Zi{v ■ v') = — N r+jy[E] (v ■ v') for 0 < j < rj 
i 

YX(1 - c)\i) r+r ' +j+1 Zi(v ■ v ) = ^ c C)n N r+jy[E] (v ■ v) for 0 < j < r\ 

i 

^((1 - c)\i) r+r ' +J+l Zi(y' ■ v') = — ^ n N r+jy[E] (v' ■ v') for 0 < j < rj 
i 

(2) If 3i : Zi(y-v) — 2zi(y-v')+Zi(v'-v') > 5(2x(min +x 2 ) then conclude that v and 
v' are in different communities. Otherwise, conclude that v and v' are in the same community. 


One could generate a reasonable classification based solely on this method of comparing 
vertices (with an appropriate choice of the parameters, as later detailed). However, that 
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would require computing N r ytm(v ■ v ) for every vertex in the graph with fairly large r + r', 
which would be slow. Instead, we use the fact that for any vertices v, v 1 , and v" with 
(J v — o v / 7 ^ o v n , 

p w,e* v , • P~'Pw,ea v , - 2P Wi e av ■ P^ 1 P Wi e CTv , + Pw,e* v ■ P~' P\v,ea v = 0 
< Pw,e<j v „ ■ P~ ' P\v,e<j v „ - 2P Wi e<j v • P~ l Pw,e<T v „ + P\v, e<r„ • P ~ 1 P\v,e<r v 

for all i, and the inequality is strict for at least one i. So, subtracting Pw^a v • P 1 P\v, e a v 
from both sides gives us that 


Pw^o 


■P 1 Pw i e 0 


,-2P Wi e (Jv -P 1 PWi^a, < Pw^o 


P 1 Pw i e 0 


■ — 2Pw, e 0 


■P ' P\v, e 0 


for all i, and the inequality is still strict for at least one i. 

So, given a representative vertex in each community, we can determine which of them 
a given vertex, v, is in the same community as without needing to know the value of 
PWie-<j v ■ P- 1 Pw i e<T v as follows. 


The Vertex_cIassification_algorithm. The inputs are (u[], v', r, r 1 , E, x, c), where u[] is a 
list of vertices, v' is a vertex, r, r’ are positive integers, E is a subset of G’s edges, x is a 
positive real number, and c is a real number between 0 and 1. It is assumed that Zi(v[a] -ufcj]) 
satisfying 

^((1 - c)\i) r+r ' +3+l Zi(v[a] ■ u[<t]) = — N r+j y[ E] (v[a\ ■ u[cr]) for 0 < j < i] 

i 

have already been computed for every v [a] € v []. 

The algorithm is supposed to output a such that v' is in the same community as v[a]. It 
works as follows. 

(1) For each a solve the system of equations 

^((1 - c)Xi) r+r ' +3+l Zi(y[a\ ■ v') = — N r+jy[E] (v[a] ■ v') for 0 < j < 

i 

(2) If there exists a unique a such that for all o' ^ o and all i, 

Zi(v[o] ■ v[o]) - 2zi(v[o] ■ v') < Zi{v[o'} ■ v[o']) - 2zi{v[o'} ■ v') + y • (2x(min pj)~ 1/2 + x 2 ) 

then conclude that v' is in the same community as v [a]. 

(3) Otherwise, Fail. 

This runs fairly quickly if r is large and r' is small because the algorithm only requires 
focusing on N r i(v') vertices. This leads to the following plan for partial recovery. First, 
randomly select a set of vertices that is large enough to contain at least one vertex from 
each community with high probability. Next, compare all of the selected vertices in an 
attempt to determine which of them are in the same communities. Then, pick one in each 
community. After that, use the algorithm above to attempt to determine which community 
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each of the remaining vertices is in. As long as there actually was at least one vertex from 
each community in the initial set and none of the approximations were particularly bad, this 
should give a reasonably accurate classification. 

The Unreliable graph classification algorithm. The inputs are ( G,c,m,e,x ), where 
G is a graph, c is a real number between 0 and 1, rn is a positive integer, e is a real number 
between 0 and 1, and a; is a positive real number. 

The algorithm outputs an alleged list of communities for G. It works as follows. 

(1) Randomly assign each edge in G to E independently with probability c. 

(2) Randomly select mn vertices in G, u[0], ...,v[m — 1]. 

(3) Set r = (1 — §) logn/log((l — c)Ai) — and r' = y • logn/log((l — c)Ai) 

(4) Compute N t h\q\ e -\ (u[i]) for each r" < r + r] and 0 < i < m. 

(5) Run Vertex_comparison_algorithm(u[i], v[j],r, r', E, x ) for every i and j 

(6) If these give results consistent with some community memberships which indicate 
that there is at least one vertex in each community in u[], randomly select one alleged 
member of each community v'[a]. Otherwise, fail. 

(7) For every v" in the graph, compute N r "tQ\ E ](v") for each r" < r'. Then, run 
Vertex_classification_algorithm(u / [], v", r, r l , E, x) in order to get a hypothesized classification 
of v". 

(8) Return the resulting classification. 

The risk that this randomly gives a bad classification due to a bad set of initial vertices 
can be mitigated as follows. First, repeat the previous classification procedure several times. 
Next, discard any classification that differs too badly from the majority. Assuming that 
the procedure gives a good classification more often than not, this should eliminate any 
really bad classification. Finally, average the remaining classifications together. This last 
procedure completes the Sphere comparison algorithm. 

The Reliable graph classification algorithm (i.e., Sphere comparison). The inputs 
are (G, c,m, e, x,T(n)), where G is a graph, c is a real number between 0 and 1, m is a 
positive integer, e is a real number between 0 and 1, x is a positive real number, and T is a 
function from the positive integers to itself. 

The algorithm outputs an alleged list of communities for G. It works as follows. 

(1) Run Unreliable_graph_classification_algorithm(G, c, m, e, x) T{n ) times and record 
the resulting classifications. 

(2) Discard any classification that has greater than 

(1 —&)x 2 A2 minp (1- c)x 2 \l m in Pl , (1—C)A f 

4ke 16Aj>(l+a:) _ e leAifcCl+e) U 4A3 ^ 

disagreement with more than half of the other classifications. In this step, define the 
disagreement between two classifications as the minimum disagreement over all bijections 
between their communities. 

(3) Let {er[i]} be the remaining classifications. For each vertex j/fG, randomly select 
some i and assert that o v is the j that maximizes |{i/ : cr[l]^/ = j} n {?/ : a[i] v > = <7[i]„}|. In 
other words, assume that cr[z] classifies v correctly and then translate that to a community 
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of cr[1] by assuming the communities of cr[i\ correspond to the communities of <r[l] that they 
have the greatest overlap with. 

(4) Return the resulting combined classification. 

If the conditions of theorem 2 are satisfied, then there exists x such that for all sufficiently 
small c, 


Reliable_graph_classification_algorithm(G, c, ln(4/c)/ min pi, e, x, In n) 
classifies at least 


1 - 


Cp 

4ke i6 k 


I _ g 16 k 


(17) 


of G’s vertices correctly with probability 1 — o(l) and it runs in 0(n 1+e ) time. 


3.2 Exact recovery and the Degree-profiling algorithm 

With our previous result achieving almost exact recovery of the nodes, we are in a position to 
complete the exact recovery via a procedure that performs local improvements on the rough 
solution. While, the exact recovery requirement is rather strong, we show that it benefits 
from a phase transition, as opposed to almost exact recovery, which allows to benchmark 
algorithms on a sharp limit (see Introduction). 

Our analysis of exact recovery relies on the fact that the probability distribution of the 
numbers of neighbors a given vertex has in each community is essentially a multivariable 
Poisson distribution. We hence investigate an hypothesis problem (see Section 8.2), where a 
node in the SBM graph with known clusters (up to o(n) errors due to our previous results) 
is taken and re-classified based on its degree profile, i.e., on the number of neighbors it has 
in each community. This requires testing between k multivariate Poisson distributions of 
different means mi,... ,m\ t E l\, where rn t = \ii(n) 6 t for 0, = ( PQ)i , i £ [k]. The error 
probability of the optimal test depends on the degree of overlap between any two of these 
Poisson distributions, which we show is either o(-) or cu(-). This is where the CH-divergence 
emerges as the exponent for the error probability. It is captured by the following sharp 
estimate derived in Section 8.2, where V c denotes the Poisson distribution of mean c. 

Theorem 3. (See Lemma 11.) For any 61,62 E (IR+ \ {0}) fc with 6 1 7 ^ 62 and p±,p 2 E 

R + \{0}, 

Y m in(Vi Q ( n ) 01 ( x )puVin(n)e 2 (x)P2) = 0 (rc~ D+((?1 ’ 02) ~ o(1 )) , 
xezl 

where D + { 61 , 62 ) is the CH-divergence as defined in (3). 

Using this result, we show that depending on the parameters of the SBM, the error 
probability of the optimal test is either o(^) or ca(£) depending on mirp<j D+(6i,6j). If 
the error probability is w(^) then any method of distinguishing between vertices in those 
two communities must fail with probability w(^), so any possible algorithm attempting to 


16 





distinguish between them must misclassify at least one vertex with probability 1 — o(l). On 
the other hand, if the degree of overlap between all communities we are trying to distinguish 
between is o(l/n) then with probability 1 — o(l) one could correctly classify any vertex 
in the graph if one knew what community each of its neighbors was in. There exists 5 
such that attempting to classify a vertex based on classifications of its neighbors that are 
wrong with probability x results in a probability of misclassifying the vertex that is only 
n 5x times as high as it would be if they were all classified correctly. Based on this, the 
obvious approach to exact recovery would be to use a partial recovery algorithm to create 
a preliminary classification and then attempt to determine which family of communities 
each vertex is in based on its neighbors’ alleged communities. However, the standard partial 
recovery algorithm has a constant error rate, so this proceedure’s output would have an 
error rate n c times as large as if each vertex were being classified based on its neighbors’ 
true communities for some c > 0. If the degrees of overlap are only barely below 1/n then 
this would increase the error rate enough that this procedure would misclassify at least one 
vertex with high probability. 

Instead, we go through three successively more accurate classifications as follows. Given 
a partial reconstruction of the communities with an error rate that is a sufficiently low 
constant, one can classify vertices based on their neighbors’ alleged communities with an 
accuracy of 1 — 0{n ~ c ) for some constant c > 0. Then one can use this classification of 
a vertex’s neighbors to determine which family of communities it is in with an accuracy 
of 1 — o(^ • n Scn c ) = 1 — o(l/n). Therefore, the resulting classification is correct with 
probability 1 — o(l). 

We formulate the algorithm in an adaptive way, where we first identify which communities 
can be exactly recovered with the notion of “finest partition,” and then proceed to extract 
this partition. In other words, even in the case where not all communities can be exactly 
recovered, the algorithm may be able to fully extract a subset of the communities. Overall, 
the algorithm for exact recovery works as follows. 

The Degree-profiling algorithm. The inputs are (G, 7 ), where G is a graph, and 
7 £ [0,1] (see Theorem 6 for how to set 7 ). The algorithm outputs an assignment of each 
vertex to one of the groups of communities {A \,..., At }, where A \,..., At is the partition 
of [k] in to the largest number of subsets such that D + ((pQ)t, ( pQ)j ) > 1 for all i,j in [k] 
that are in different subsets (i.e., the “finest partition,” see Firgure 1). It does the following: 

(1) Define the graph g' on the vertex set [n] by selecting each edge in g independently 
with probability 7 , and define the graph g" that contains the edges in g that are not in g'. 

(2) Run Sphere-comparison on g' to obtain the preliminary classification a' £ [k] n (see 
Section 7.) 

(3) Determine the edge density between each pair of alleged communities, and use this 
information and the alleged communities’ sizes to attempt to identify the communities up 
to symmetry. 

(4) For each node v £ [n], determine in which community node v is most likely to belong 
to based on its degree profile computed from the preliminary classification a' (see Section 
8 . 2 ), and call it a” 

(5) For each node v £ [n], determine in which group A\,..., At node v is most likely to 
belong to based on its degree profile computed from the preliminary classification a" (see 
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Section 8.2). 


4 Overlapping communities 

We now define a model that accounts for overlapping communities, we refer to it as the 
overlapping stochastic block model (OSBM). 

Definition 6. Let n,t £ Z+, / : (0,1}* x (0,1}* — > [0,1] symmetric, and p a probability 
distribution on (0, 1}*. A random graph with distribution 0SBM(n, p, f) is generated on the 
vertex set [n] by drawing independently for each v £ [n] the vector-labels (or user profiles) 
X(v) under p, and by drawing independently for each u, v £ [n], u < v, an edge between u 
and v with probability f(X(u),X(v)). 

Example 1. One may consider f(x,y) = 6 g {x,y), where X{ encodes whether a node is in 
community i or not, and 

o g (x,y) = g((x,y)), (18) 

where (x, y) = Y^i.=i x iVi counts the number of common communities between the labels x 
and y, and g : { 0 , 1 ,..., t} —> [ 0 , 1 ] is a function that maps the overlap score into probabilities 
(g is typically increasing). 

Example 2 . As a special case of the previous example, one may consider that a connection 
takes place between each pair of nodes as follows: each community (i.e., each component in 
the user profile) generates a connection independently with probability q+ if the two nodes are 
in that community (i.e., if that component is 1 for both profiles), and multiple connections 
are equivalent to one connection. We also assume that any pair of nodes without a common 
community connects with probability q-, so that 


9{s) 


i - (i -q+) s , */s/0, 

P-, ifs = 0. 


(19) 


If we consider q_ and q+ to be vanishing, like 0(log(n)/n), we may consider the equivalent 
model where 


g(s) 


sq+, ifs^ 0 , 
P-, ifs = 0. 


( 20 ) 


If t = 1, this model collapses to the usual symmetric stochastic block model with non¬ 
overlapping communities. 

Note that in general we can represent the OSBM as a SBM with k = 2* communities, 
where each community represents a possible profile in {0, l} 4 . For example, two overlapping 
communities can be modelled by assigning nodes with a single attribute ( 1 , 0 ) and ( 0 , 1 ) 
to each of the disjoint communities and nodes with both attributes (1,1) to the overlap 
community, while nodes having none of the attributes, i.e., ( 0 , 0 ), may be assigned to the 
null community. 
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Assume now that we identify community i G [k] with the profile corresponding to the 
binary expansion of i — 1. The prior and connectivity matrix of the corresponding SBM are 
then given by 


Pi=p{b(i )) (21) 

Qi,j = b(j)), (22) 

where b(i) is the binary expansion of i — 1 , and 

OSBM(ra,p,/) = SBM(ra,p,g). (23) 

We can then use the results of previous sections to obtain exact recovery in the OSBM. 

Corollary 3. Exact recovery is solvable for the OSBM if the conditions of Theorem 6 apply 
to the SBM(n,p,q ) with p and q as defined in (21), (22). 

5 Further literature 

The stochastic block model was first introduced in [HLL83], and in [BCLS87, DF89] as the 
planted bisection model. For the first three decades, a major portion of the literature has 
focused on exact recovery, in particular on the case with two symmetric communities. The 
table below summarizes a partial list of works for exact recovery: 


Bui, Chaudhuri, 

Leighton, Sipser ’84 

min-cut method 

p = Q(l/n),q = o{n 1 4 /(0 > +'?) ri )) 

Dyer, Frieze ’89 

min-cut via degrees 

p-q = 0 ( 1 ) 

Boppana ’87 

spectral method 

(P ~ 0)/\/P + q = tt(^/\og(n)/n) 

Snijders, Nowicki ’97 

EM algorithm 

p-q = Q( 1 ) 

Jerrum, Sorkin ’98 

Metropolis algorithm 

p-q = n(n -1 / 6+£ ) 

Condon, Karp ’99 

augmentation algorithm 

p — q = D(n 1 / 2+£ ) 

Carson, Impagliazzo ’01 

hill-climbing algorithm 

p— q = f 2 (n 1,/2 log 4 (n)) 

Mcsherry ’01 

spectral method 

{p-q)/VP> ^(\A>g(rc)/ra) 

Bickel, Chen ’09 

N-G modularity 

(p ~ q)/y/p + q = 0 (log(n)/\/n) 

Rohe, Chatterjee, Yu ’ll 

spectral method 

P~Q = fi(l) 


These works display an impressive diversity of algorithms, but are mainly driven by the 
methodology and do not reveal the sharp behavioral transition that takes place in this 
model, as later shown in [ABH14, MNSb] (see below). Before discussing these results, 
one should mention that various other works have considered recovery algorithms for 
multiple communities without identifying phase transitions. We refer to [CSX12, YC14] for 
a summary of these results. In particular, [YC14] has recently studied information-theoretic 
vs. computational tradeoffs in coarse regimes of the parameters for symmetric block models 
with a growing number of communities. 

Phase transition phenomena for the SBM appeared first for weak recovery. In 2010, 
Coja-Oghalan [ColO] introduced the weak-recovery problem, and obtained bounds for the 
constant average degree regime using a spectral algorithm. Soon after, [DKMZ11] proposed a 
precise picture for weak-recovery using statistical physics arguments, with a sharp threshold 
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conjectured at (a — b) 2 = 2(a + 6), when a = pn and b = pn. This has opened the door 
to a new series of work on the SBM driven by phase transitions. The impossibility part 
of the conjecture was first proved in [MNS12], using a reduction to broadcasting on trees 
[EKPSOO], and the conjecture was fully established in 2014 with [Masl4, MNS14], 

Recently it was realized that exact recovery also admits a phase transition pheonemon. 
This was set in [ABH14], and shortly after in [MNSb], with the threshold located 10 at 
sfa — Vb = \/2 when a = pn/ln(n ) and b = qn/ln(n). Efficient algorithms were also 
obtained in these papers. Hence, weak and exact recovery are solved in the symmetric 
two-community SBM. 

One should also mention a line of work on another community detection model called the 
Censored Block Model (CBM), studied in [AM13, ABBS14a]. This model and its variants 
were also studied in [AM, HG13, CHG, CG14, ABBS14b, GRSY14], A SDP relaxation as in 
[ABH14] for the SBM was first proposed in [ABBS14a] for the CBM, with a performance gap 
having roughly a factor 2. This gap was recently closed in [BH14]. SDP relaxations for block 
models were also studied in [YC14, AL14, GV14], Note that SDP algorithms are polynomial 
time but far from quasi-linear time. For the CBM, recent works [CRV15, SKLZ15] obtained 
tight bounds for weak recovery using spectral methods. 

Two recent works [GV14, CRV15] have also obtained bounds for partial recovery in the 
SBM with multiple communities, for the case of symmetric blocks or with bounds on the 
connectivity probabilities in terms of symmetric blocks. No phase transitions for exact or 
weak recovery have yet been proved for the SBM with more than two communities. 

6 Open problems 

Several extensions would be interesting for the SBM with specified parameters, such as 
considering parameters that vary with n, in particular for the number of communities, or 
communities of sub-linear sizes. Part of the results obtained in this paper should extend 
without much difficulty to some of these cases. It would also be interesting to investigate 
how the complexity of algorithms scales with the number of communities. * 11 It would also be 
important to obtain results and algorithms that do not rely on the knowledge of the model 
parameters. Here also, some of the techniques in this paper may extend. 

For partial recovery, it would interesting to obtain tight upper-bounds on the accuracy 
of the reconstruction in the general SBM, in particular for the regime of large constant 
degrees, to check if the bound obtained in this paper is tight. For the symmetric case, the 
information-theoretic and computational thresholds for weak-recovery remain open for more 
than 2 communities. 

Finally, there are many interesting other models to investigate, such as the Censored 
Block Model [AM13, ABBS14a, ABBS14b, AM, HG13, CHG, CG14, GRSY14], the Labelled 
Block Model [HLM12, XLM14] and many more. It would be natural to expect that for 


10 [MNSb] allows for a slightly more general model where a and 6 are 0(1) and gives the behaviour at 
the threshold. Note that at the threshold, one has to distinguish the case of b = 0 and b > 0 (assuming 
a > b), since for 6 = 0 the clusters are not connected whp and it is not possible to recover the clusters with a 
vanishing error probability. 

11 In [YC14] this question is studied for coarser regimes of the parameters. 
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these models as well, an information-measure a la CH-divergence obtained in this paper 
determines the recovery threshold. Obtaining such variants would provide major insight 
towards a theory for community detection in general network models. 


7 Partial Recovery 

7.1 Formal results 

Theorem 4. Given any k £ Z, p G (0, l) k with \p\ = 1, and symmetric matrix Q with no 
two rows equal, let A be the largest eigenvalue of PQ, and X' be the eigenvalue of PQ with 
the smallest nonzero magnitude. For any 


0 < x < min 


3max[ln(A 2 /(A / ) 2 )/ln((A / ) 2 /A),ln(2A 2 /(A / ) 2 )/ln(2A 3 /(A / ) 2 )] < e < 1, 
A k 


( | ^ -,-(min pi) 1/2 + J 1/minpi + min \P w {ei - ej) ■ P 1 P\y(e i - ej) |/13 

\ | A' | mm pi V 


where Pwi^i ~ e j) Is the projection of e* — Cj on to W, and the last min is taken over all 
communities i,j and eigenspaces W of PQ such that Pw(^i) / Pw( e j)> and 


2/ x /\2 

™ (A ) minp^ 


2 / \ t\2 
x (A ) minp^ 


2 ke 1 6Afc 3 / 2 ((minp i ) l / 2 +x) j ^ _ g 16Afe 3 / 2 (( min ~ 1 / 2 + x ) 




< y < 


mm pi 
4 ln(4fc) 


(which may not exist 12 ) there exists an algorithm that detects communities in graphs drawn 
from Gi (n,p,Q) with accuracy at least 1 — 2 y at least 1 — o(l) of the time and runs in 
0(n 1+€ ) time. 


We refer to Section 2.2 for a less technical statement of the theorem. We next provide 
further details on the example provided in Section 2.2, in particular concerning the constants. 

Corollary 4. Consider the k-block symmetric case. In other words, Pi = i for all i, and 
Qij is a if i = j and [3 otherwise. The vector whose entries are all Is is an eigenvector of 
PQ with eigenvalue and every vector whose entries add up to 0 is an eigenvector 

of PQ with eigenvalue Tdl . So, A = a+ ( k ~ 1 )d anc [ \> = IrzT. an d 


p > 4 44 


(a ~ b? 

4 k(a + (k — 1)6) 


(24) 


which is the signal-to-noise ratio appearing in the conjectures on the detection threshold for 
multiple blocks [DKMZ11, MNS12]. We then have 

min \P w {ei - ej) ■ P~ 1 Pw(,e i - ej)\/13 (25) 

i,j, WE eigenspaces of PQ] Pw{ei)¥=Pw(ej) 

= |(ei - e 2 ) • P~\e i - e 2 )|/13 = 2fc/13. (26) 


So, 

0 < x < i/l5fc/13 — Vk 


12 These parameter will exist in our applications of the theorem. 
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and as long as k(a + (k — l)/3 ) 7 < (a — /3) 8 and 4 k(a + (k — 1 )/3) 3 < (a — f3) 4 , there exists 
an algorithms that detects communities, and the accuracy is 

l _ (9(g-c(a-/3) 2 /(fc(a+(fe-l)^))-j 

for sufficiently large ((a — (3) 2 /(k(a + (k — l)/3))), where c = x 2 /16k^ 2 (x + Vk). 

Considering the way e, x, and y scale when Q is multiplied by a scalar yields the following 
corollary. 

Corollary 5. For any k E Z, p E (0, l) fc with \p\ = 1, and symmetric matrix Q with no two 
rows equal, there exist e(5) = 0(1/ \n(6)) and constant c\ > 0 such that for all sufficiently 
large 6, Sphere-comparison detects communities in graphs drawn from Gi (n,p,6Q) with 
accuracy at least 1 — 0$(e ~ Cl5 ) in O n (n 1+e ^) time for all sufficiently large n. 

Corollary 6. For any k E Z, p E (0, l) fc with \p\ = 1, symmetric matrix Q with no two 

rows equal, b > 0, and 1 > e > 0, there exists c> 0, such that Sphere-comparison detects 
communities in graphs drawn from Gi (n,p,cQ) with accuracy at least 1 — b in 0(n 1+e ) time 
for sufficiently large n. 

If instead of having constant average degree, one has an average degree which increases as 
n increases, one can slowly reduce b and e as n increases, leading to the following corollary. 

Corollary 7. For any k E Z, p E [0, l] fc with \p\ = 1, symmetric matrix Q with no two 

rows equal, and c(n) such that c = w(l), Sphere-comparison detects the communities with 
accuracy 1 — o(l) in Gi (n,p, c(n)Q ) and runs in o(n 1+e ) time for all e > 0 . 

These corollaries are important as they show that if the entries of the connectivity 
matrix Q are amplified by a coefficient growing with n, almost exact recovery is achieved by 

(Sphere-comparison). 

7.2 Proof of Theorem 4 

Proving Theorem 4 will require establishing some terminology. First, let Ai,..., A h be the 
distinct eigenvalues of PQ , ordered so that |Ai| > |A 2 1 > > \Xh\ > 0 . Also define y so that 

y = h if A^ 7 ^ 0 and rj = h — 1 if A^, = 0. In addition to this, let d be the largest sum of a 
column of PQ. 

Definition 7. For any graph G drawn from Gi (n,p,Q) and any set of vertices in G, V, 
let be the vector such that is the number of vertices in V that are in community i. 
Define w±(V), W 2 (V), Wh(V) such that V = ffwiiV) and wfiV) is an eigenvector of 
PQ with eigenvalue A* for each i. 

w\(y), ..., Wh(V) are well defined because IR fc is the direct sum of PQ 's eigenspaces. The 
key intuition behind their importance is that if V' is the set of vertices adjacent to vertices 
in V then « PQ so wfiV') » PQ ■ w t (V) = A iwfiV). 

Definition 8. For any vertex v, let N r (v) be the set of all vertices with shortest path to v 
of length r. If there are multiple graphs that v could be considered a vertex in, let N r \Qn(v) 
be the set of all vertices with shortest paths in G' to v of length r. 
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We also typically refer to N r ^ (v) as simply N r r G n(v), as the context will make it clear 
whether the expression refers to a set or vector. 

Definition 9. A vertex v of a graph drawn from Gi (n, p , Q ) is ( R , x)-good if for allO < r < R 
and w € with w ■ Pw = 1 


and ( R,x)-bad otherwise. 

Note that since any such w can be written as a linear combination of the e*, v is 
(i?,x)-good if |e* • N r+ \(v) — e* • PQN r (v)\ < y/lhjk for all 1 < i < k and 

0 < r < R. 

Lemma 1. If v is a (R, x)-good vertex of a graph drawn from Gi (n,p,Q), then for every 
0 < r < R, |iV r (^)| < X[Vk ((minpi)^ 1 / 2 + x). 

Proof. First, note that for any eigenvector of PQ, w, and r < R, 

\(P • Nr +1 (v) - (p-'w) ■ PQN r (v )I < ^ Vw • p-'w 

So, by the triangle inequality, 

|(-P _ 1 w) ' N r+1 (v)\ < \(P~ 1 PQw) • N r (v )| + ^ Vw • P-% 

< Ai|(P _ 1 t(;) • -/V r (u)| + x \Ac • P~ l w 

Thus, for any r < R, it must be the case that 

r / \ \ r ' 

| (P~ 1 w) ■ iV r (u)| < Ai|(P _ 1 u;) • JVo(v)| + ^ X[~ r ■ x i -V j Vw ■ P~ 1 w 

r '=1 V ^ / 

< A ? , (\w av /p av \ + xVw ■ P~ l W^j 

Now, define w\,..., Wh such that PQwi = A iWi for each i and p = Xa=i w i • F° r an y *>j\ 

A iWi ■ P~ 1 Wj = (PQwi) ■ P~ x Wj 
= ivi ■ P~ l PQwj 
= XjWi ■ P~ 1 wj 

If i V ji then A i 7 ^ A j, so this implies that Wi ■ P~ 1 wj = 0. It follows from this that 

y Wi • p~ i wi = y vj t ■ p~ i Wj 

i i,j 

= (?“*) 

= p ■ P~ 1 p = 1 



XXr 


XI 


\w 


N r+1 (v)-w-PQN r (v)\<^ 2A 


23 



Also, for any i, it is the case that \ (w i ) (Tv /p a J < yj (wi) av ■ p a 2 ■ (wi)a v /y/P^ < (minpj) 1/2 V'm ■ P 1 w i 
Therefore, for any r < R, we have that 


\N r (v)\ = KP-'p) ■ N r (v)\ 

< Y \( p - lu ’i) • N r (v)\ 

i 

< K Y I ( W i)<Tv/P<Tv \ + A 1 X^2V W i ■ P-^Wi 

i i 

< A ? [V / ^((minpj) _1 ' /2 + x) 


□ 


We will prove that for parameters satisfying the correct criteria, most vertices are good, 
but first we will need the following concentration result (see for example [RS13] page 19). 

Theorem 5. Let X±,...,X n be a sequence of independent random variables and d,a e IR 
such that for all i, \X t — ,E[Xj]| < d with probability 1 and Var[Xj] < a 2 . Then for every 
a > 0, 

p(\Y, x ‘-e 

\ i =l 

where 5 = a/d, 7 = a 2 /d 2 , and D(p\\q) = pln(p/q) + (1 — p) ln((l — p)/{ 1 — q)). 

Note that for any vertex v E G, r E Z, and 1 < i < k, 

e i ■ N r {v ) = Y Ve N r (v))Ci ■ {«'} 

v’eG 


, 2=1 


where I( v 'eN r (v)) is 1 if v> is i n N r (v) and 0 otherwise. Note that 


e* • {V}| < 1 


for all v', and 


E[{I(y'£N r (v )) e * ■ { v }) - '] — 


dlW-i^)! 

n 


because L v 'eN r (v)) is nonzero with probability at most d\N r -i{v)\/n. A vertex in community 
<7 that is not in N r /(v ) for r' < r is in N r (v) with a probability of approximately 1 — 
e -e<rQN r -i(v)/n, anc i there are approximately p a n — 0(| U r / <r N r t(v)\) such vertices, so the 
expected value of ei-N r (v ) differs from ei-PQN r -\{v) by a term which is at most proportional 

to\N r ^(v)\-^ZoWi(v)\/ n . 

Theorem 2 can be applied to this formula in order to bound the probability that a vertex 
will be bad, but first we need the following lemma. 


Lemma 2. For any 0 < 6 ,7 < 1, > 


27 2 (l+7)‘ 


24 









Proof. First, note that if 0 < x < 1 then ln(l + x) = YhLi ^ E X > x — ^ = x ■ 
Similarly, if x < 0 then ln(l + x) = — ln(l/(l + x)) = — ln(l — x/(l + x)) > So, 


D 


6 + 7|. 7 

1 + 7 1+7 


= Ttr ln /« + 7\ 77 

1+7 V 7 / 1+7 

<5 + 7 <5 2j — 6 1 — 6 —6 

> —-- - -+ 


1 + 7 7 27 1 + 7 1 — 6 


1 + 7 
5 


(£ + 7 X 27 - 6 ) 

2j 2 

7(5 — 6 2 


- 1 


1 + 7 2j 2 

5 2 {l~6) 

2 7 2 (1 + 7 )' 


□ 


Lemma 3. Let k G Z, p G (0, l) k with \p\ = 1 , Q be a symmetric matrix such that \ 4 > 4Af, 
and 0 < x < , Y k —. Then there exists 

Arj min pi 


2 2 

x A„ min p^ 


\ 2 , 
V 


y <[ 2 he 16^1 fc 3 / 2 ((minp i ) 1 / 2 +x) j ^ _ g fc^/ 2 ((minp^ ) 1 / 2 +a;) 4A^ 


((Xk)-i) 


and R(n) = w( 1) sttch that at least 1 — y of the vertices of a graph drawn from Gi (n,p,Q) 
are ( R(n),x)-good with probability 1 — o(l). 

Proof. First, consider a constant R. Now, define Wh such that PQwi = A iWi for each 

i and p = Ya=\ w i■ If v is (r, x)-good then by the same logic used in the proof of lemma 1, 


each vertex of G is in N r +i(v) with probability at most 
p ■ QN r (v)/n = — E(P -1 ^) • PQN r (v ) 

l 

-Y +1 E + -aE^e xAEp 31 


1 , 

< 

n 


< — X'^Yk((minpi) + x) 

n 

Recall that a vertex v is (R, x)-good if (but not only if) 

r A / A 2 \ r 

Ie* • AT r+ i(u) - e, • PQJV r (v)| < j \fpYjk 

for all 0 < r < R and 1 < i < k. This condition holds for i and r with probability at least 
1 — 2e~ nD( ' J +x^ J +h\ where 6 ~ ^(jf^Yi^/pi/kn 2 ) and 7 ~ A)) +1 \/fc((minpj) _1//2 + x)/n = 
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Sl(d\N r (y)\/n) That means that in the limit as n —> oo, v is (R, ®)-bad with probability at 
most 


R— 1 k 

r=0 i =1 


\ R 1 _ g 2 (-y—<5) 

> << n 27^ (1 —|— ) 


Y, 2ke ~ 


-nD^W^) <J22ke 

r =0 
R—l 

< 

r =0 
R-l 

< ^ 2^ e ~ 
r=0 

R—l 


,^( 7 /2+( 7 /2-<5)) 
1 2^2 


x 2 A 2 / A 2 X 2r 

2A? 


1 Pi/ k 


< ^ ^ 2fce 4A 1 + 1 V^;(( min Pi) 1 / 2 +a:) 

r=0 

OO 

Y, 2ke 


,2,2 , 


\4 \ r 


r=0 

OO J ' ''r/ i'i I "77 

< A 2/ne 16fcA 1 \/fc((minp i ) — 4 / 2 +x) \ 4A^ 

r=0 

OO X 2 A^ min p^ (^ ( ( A 

< ^ ^ 2fce 1 6fc 3/,2 A 1 ((minp i ) _1 / 2 +x; y \V *^1 
r=0 

_ x 2 A 2 min _ /_ x 2 A 2 minpj A 4 N 

— 2/je 16Ajfc^/ 2 ((minpj) — 4 / 2 +x) l I ^ g 16 A -7 fc 3 / 2 ((min pp — 4 / 2 + 2 ;) 4A^ 




Given random v and v 1 , if i/ is ( R , x)-good then there are at most V 7 fc((minp,;) x / 2 + 

,x) vertices in Li R =0 N r (v'). Note that U R =0 N r (v) is disjoint from any set of ^ r =o ^1 Vk{(mxn.pi)~ 1 / 2 + 
x ) vertices that were chosen independently of v with probability 1 — 0(l/n), so 

| P[v is (R, x ) — good] — P[v is ( R , x) — good]?/ is ( R , x) — good]] = 0(l/n). 


That means that for any 


2 2 

OO x A^ min p^ 

^ ^ 2)^6 16 fc3//2 ^l ((minp^) -1 / 2 +x) 

r=0 




at least (1 — y)n of the vertices in a graph drawn from Gi (n,p,Q) are (R, x)-good with 
probability 1 — o(l). 

So, for every r there exists N r such that for all n > N r , at least (1 — y)n of the vertices 
of a graph drawn from G(p,Q,n) are (r, x)-good with probability at least 1 — 2~ r . Now, let 
R(n ) = sup{?’ : n > N r }. It is clear that lrnin^oo R(n) = oo, and for any n, at least (1 — y)n 
of the vertices of a graph drawn from G(p, Q , n) are ( R(n),x ) good with probability at least 
1_ 2 -- r M = l-o(l). □ 

Lemma 4. Let k 6 Z, p G (0, l) k with \p\ = 1 , Q be a symmetric matrix such that Af ; > 4A 3 , 
R(n ) = uj( 1), and e > 0 such that (2Af/A 2 ) 1 ~ e / 3 < Ai. A vertex of a graph drawn from 
G{p,Q,n) is (i?(n), x)-good but ( 1 i n ^ 3 Inn, x)-bad with probability o(l). 
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1 c /3 

Proof, for any r < ln 7 / In n, if v is (r, x)-good then 


r 

| u£ =0 N r (v)\ < ^2 A ? [v / fc((minpi) _1 ' /2 + x) < Ai +1 -\/fc((minpj) -1 / 2 + x)/(Ai - 1). 
i =o 

By assumption, 

l«r-l Ml • I UL„ N t (v) |/ f j 

< (2Af/A 2 ) r • 2((mm Pi y 1/2 + x) 2 yjk 3 /p j /(x(\ 1 - 1)^1) 

< (2Af/A 2 ) 1 ^ 111 " • 2((minpj)“ 1/2 + x) 2 sj k 3 / Pj / (x {Ai - 1)^1) 

= o(n) 


So, if n is sufficiently large, then the expected value of ej • 7V r+ i(u) differs from a ■ PQN r {v) 
by less than \ ■ yjpi/k for all r < bin. For such an n, a (R(n),x) good 

vertex is also ( ln ^ /J Inn, x)-good if e* • N r+ \ differs from its expected value by at most 
V'Pi/k f° r R{n) < r < 1 1 ~^ 3 Inn and 1 < i < k. Note that 


A 1 

AiV / fe((minpj ) -1 / 2 + x) = o(n), 

r =0 


so for a given i and r this holds with probability at least 1 — 2 e nD ^ 
S ~ ^^-(jy) r (\/Pi/kn 2 ) and 7 ~ A^ + 1 \/fe((minpj ) _1 / 2 + x)/n. 


++7|[ 7 \ 

1+7 11 1+7 7 where 


D ( ^ + 7 11 7 \ Y 1 [7 — <5] 

\l + 7 1 + 7/ 27 2 (l + 7) 

5 2r y 

rv - 

2 7 2 

- ?_ 

~ 2 7 

x 2 \ 2 P i f A 3 \ 

32Aifc 3 / 2 ((minpj) -1 / 2 + x)n y4A 3 J 

( (K M 

— 32Aifc 3 / 2 ((minpj) -1 / 2 + x)n y y4A 3 J J 

So, there exist N, a, and b > 0 such that if n > N and r < 7777 Inn, then -D(i++llI++) > 
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1 £/3 

(a + br)/n. So, the probability that v is (R(n),x)- good but ( ln ^' Inn, x)-bad is at most 


4 Inn 

In \-± 


— nD( 


«±7 ft 7_ 


1 —e/3 

In Ai 


Inn 


2k Y e _ "'^ v T+7 "t+7 ; < 
r=R(n ) 


} < 2k Y e ~ a ~ br 

r=R(n) 

< 2ke ~ a - bR{n) /(I - e~ b ) 

= o(l) 


□ 

Definition 10. For any vertices v,v' £ G, r, r' £ Z, and subset of G’s edges E, let 
N r y[E](v ■ v') be the number of pairs of vertices (ui,U2) such that vi £ N r \ G \ E i ( v )> v 2 £ 
N r'[G\E](v'), and (vi,v 2 ) £ E. 

Note that if N r \ G \ E (v) and N r '\ G \E]( v> ) have already been computed, N rr /r E \(v ■ v') can 
be computed by means of the following algorithm, where E[v\ = {v' : (v,v') £ E} 


computeJV rr /[£] (v ■ v'): 
for vi £ N r , [G \ E] (v'): 
for v 2 £ E[v i] : 

if v 2 £ Nr[G\E] (v) : 
count=count+1 

return count 


Note that this runs in 0((d+ l)|-N r /r G \£](V)|) average time. The plan is to independently 
put each edge in G in E with probability c. Then the probability distribution of G\E will 
be Gi (n,p, (1 - c)Q), so N r[GSiEi (v) « ((1 - c)PQ) r e av and N r , [G \ E] (v') « ((1 - c)PQ) r 'e av ,. 
So, it will hopefully be the case that 

N r y[ E \(v-v') « ((l-c)PQ) r e <Jv -cQ((l-c)PQ) r 'e^/n = c(l-c) r+r 'e av -Q(PQ) r+r 'e^/n. 
More rigorously, we have that: 

Lemma 5. Choose p, Q, G drawn from Gi (n,p,Q), E randomly selected from G’s edges 
such that each of G’s edges is independently assigned to E with probability c, and v,v' £ G 
chosen independently from G’s vertices. Then with probability 1 — o(l), 

\N r y[ E ](v ■ v') - N r[G \ E] (v) ■ cQN r , [G \ E] (v')/n\ < (1 + ^/|iV r [ G \ S ](u)| • \N r , [G \ E] (v')\/n) log n 

Proof. Roughly speaking, for each v\ £ N r \ G \ E Rv) and v 2 £ N r a G \e\( v> ), (vi,v 2 ) £ E with 
probability cQ avi ,a V2 /n. This is complicated by the facts that (ui,ui) is never in E and no 
edge is in G\E and E. However, this changes the expected value of N r y \e\(v ■ v r ) given 
G\E by at most a constant unless G has more than double its expected number of edges, 
something that happens with probability o(l). Furthermore, whether (v\, v 2 ) is in E is 
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independent of whether (v \, vf) is in E unless {v'i,v' 2 ) = (rq, vf) or = (u 2 ,ui). So, 

the variance of N r y^(v ■ v') is proportional to its expected value, which is 

0(|^V r [G\£]( v )l • \ N r'[G\E](v')\/n). 

N r ,r'[E](v • v') is within logn standard deviations of its expected value with probability 
1 — o(l), which completes the proof. □ 

Note that if it is an eigenvector of (1 — c)PQ , \J~PQ~i t is an eigenvector of the symmetric 
matrix (1— c)\^PQV~P. So, since eigenvectors of a symmetric matrix with different eigenvalues 
are orthogonal, we have 

N r[G\E\(v) ■ cQN r i [G\B] (v')/n = ^^2 w i( N r[G\E]{v)) ' QWi(N r , [G \ E] (v')) 

i 

Lemma 6 (Decomposition Equation Lemma). Let x>0,0<c<l such that (1 — c)A^ > Ai, 
e > 0, G drawn from G\(n,p,Q), E be a subset of G’s edges that independently contains 
each edge with probability c, r,r' 6 Z + such that r + r' > (1 + e) logn/log((l — c)X 2 /X\) 
and r > r', and v,v' G G be chosen independently of G’s adjacency matrix. The system of 
equations 

^((1 - c)\i) r+r ' +J+1 Zi = — N r+jy[E] (v ■ v ') for 0 < j < r] 
i 

has a unique solution. Furthermore, if v is (r + g,x)-good and v' is (/ + 1 ,x)-good with 
respect to G\E then 

|Zi — uq({u}) • P^ 1 u'i({u / })| < 2x(mmpj)~ 1 / 2 + x 2 + o(l) 
for all i with probability 1 — o(l). 

Proof. First, note that by (r + rj,x)~ goodness of v, 

\w ■ Wi(N r+j[G \ E] (v)) - A \w • w i (iV r [ G \ jB] (u))| < x(d( 1 - c)) J_1 (l - c)\X v 


((l~c)X 2 \ 

V 2A ^ J 
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whenever w ■ Pw = 1. So, with probability 1 — o(l) 


iDd- c)A*) J u;i(iV r[G \ E ](r;)) • Qwi(N r/[G \ E] (v')) - ™N r+j 

i 

< ^1 + y/(max 1/pi + x) 2 (l — c) r+r ' + i\[ +r /n'j — logn 


i 

< ^1 + y(max 1/pi + x 


\ » / 

/ A 2 \ r 

+ .t|A, ? |(1 - c) r+r ' +j d j ~ l —p A^(max 1/p* + x) 

\ ZA i / i 

< + ((1 - c)A, ? ) r+r ' +j y/(maxl/pi + x) 2 (Ai/((l - c)A 2 )) 

/ , 0 \ (r+r 1 )/2 


_ > 

r+r'+j . n 


v\\ v \(l-c) r+r ' +j d j 


(max 1/pi +, 
i 


< (((I - c)\ v ) logn/losai - c)x ^ 
+ x|A t? |(l -c) r+r ' +j d j 


■/ log ((! c)A^) _|_ _ c ^\ ri Y+ r +i ^(max 1/pi + .T) 2 n _1_e • 


-i K 


(r+r ')/2 


(max 1/pi + ,x) 
i 


V z / 

< (((1 - c)A^)( r+r# )/( 1+e ) + ((1 - c)A, ? ) r+r ' +J y/(max 1/ Pi + x) 2 n"^ 

/^ 2 \ (r+r')/ 2 

tI A 111 — <T r+r | _JZ I (max 1 /p^ + 

i 


+x|\i(i -cy +r '+’i’ 

= «(((! - c)A„)' +r ') 


Now, let M be the matrix such that Mij = ((1 — c)\i)P This matrix is invertible because 
the A i are distinct, so the system of equations has a unique solution. Furthermore, for fixed 
values of c and i, ((1 — c)Aj) r+r \zi — Wi(N r \Q\ E Uv)) ■ Qwi(N r i\Q\ E i(v')) is a fixed linear 
combination of these error terms. So, 

|Zi - MM) • P^Wiiiv '})| 

< \zi - (1 - c)((l - c)A i ) _r_r '^ 1 w;i(iV r [ G \ E ](u)) • Qwi(N r , [G \ E ](v')\ 

+ (! - c) • |((1 - c)Ai)- r_r ' _1 «)i(JV r[G \ £ |(t;)) • Qwi(N r , [G \ E] (v') 

- ((1 - c)A;) _r, ~ 1 tt>i({u}) • Qwi(N r , [G \ E] (v')\ 

+ K 1 - c) • ((1 - MM'^MM) • Qwi(N r , [G \ E] (v') - MM) • P _ 1 MM})I 

< |((! - c)A i ) _r 'u; i (N f ./[ G \ £ ;](u / )) • P _1 [((l - c)Xi)~ r Wi(N r[G \ E] (v)) - MM)] I 

+ IMM) • P _1 [((l - c)\i)~ r 'Wi(N r , [G \ E] (v') - MM}))] + o(l) 
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By goodness of v and v'. this is less than or equal to 


((l-c)Ai) r 'Wi(N r , [G \ E] (v')) ■ P 1 w i {N r , [G \ E] (v')) ■ X + y/wi( {u}) • P MM) • X + o(l) 

< yjwidv 1 }) ■ P-%i({u'}) + 2wi({v'}) • P-M(( 1 - c)\i)~ r 'Wi(N r , [G \ E] (v')) - Wi({v'})] + 
[((1 - c)\i)~ r 'Wi(N r , [G \ E] (v')) - Wi({u'})] • ^[((l - c)Xi)- r 'Wi(N r , [G \ E] (v')) - Wi({V})] • x 
+ X\J 1/ minpj + o(l) 

< \Jl /ruin pj + 2x/y/mm pj + x 2 x + x /ydnin pj + o(l) 

= ( x 2 + 2x(minp ? ') -1 / 2 ) + o(l) 

with probability 1 — o(l) for all i. □ 

For any two vertices in different communities, v and v', the fact that Q’s rows are distinct 
implies that — {u 7 }) / 0. So, rci({u}) ^ Wi({v'}) for some 1 < i < rj. That means 

that for any two vertices v and v\ 

MM) - MW})) • p-'MM) - 

= Wj(M) • P^Widv}) - 2wi({v}) ■ P^Widv'}) + Wi({v'}) • P^w^v 1 }) > 0 

for all 1 < i < rj, with equality for all i if and only if v and v' are in the same community. 
This also implies that given a vertex v, another vertex in the same community v', and a 
vertex in a different community v", 

2wi({v}) • P L Wi(- {V}) - Wi({v'}) ■ P^Widv'}) 

> 2wi({v}) ■ P^Widv"}) - Wi({v"}) • P^Widv"}) 

for all 1 < i < r] and the inequality is strict for at least one i. This suggests the following 
algorithms for classifying vertices. 


Vertex_comparison_algorithm(v,v’, r,r’,E,x,c): 

(Assumes that N r n\ G \ E Av) and N r "\G\E]{ v ) have already been computed for r" < r + r/) 

Solve the equations given in the previous lemma for (v,v\r,r'), (v,v,r,r'), and (v',v',r, r') 
in order to compute Zi(v ■ v'), Zi(v ■ v), and Zi{y' ■ v') 

If 3* : Zi(v ■ v ) — 2 Zi(v ■ v') + Zi(y' ■ v') > 5(2x(minp^-) -1 / 2 + x 2 ) then conclude that v and v' 
are in different communities. 

Otherwise, conclude that v and v' are in the same community. 
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Lemma 7. Assuming that each of G’s edges was independently assigned to E with probability 
c, this algorithm runs in 0(((1 — c)Ai) r ) average time. Furthermore, if the conditions of 
the decomposition equation lemma are satisfied and 13(2x(min Pj)~ 1 / 2 + x 2 ) is less than the 
minimum nonzero value of (wi({v}) — ^(-{V})) • P~ 1 (uh({v}) — then the algorithm 

returns the correct result with probability 1 — o(l). 

Proof. The slowest step of the algorithm is using compute N r+j r ,t E j (v-v') in order to calculate 
the constant terms for the equations. This runs in an average time of 0(F , [|lV r /[ G \£’]](7;)| + 
E[\N r oQ\j^] ( 7 /) |]) = 0((( 1 — c)Ai) r/ ) and must be done 3 rj times. If the conditions of the 
decomposition equation lemma are satisified then with probability 1 — o(l) the z % are all 
within |(2x(minpj) -1 / 2 + x 2 ) of the products they seek to approximate, in which case 

Zi(v ■ v ) — 2 Zi(v ■ v') + Zi(v' ■ v') > 5(2a;(minp :/ )“ 1 / 2 + x 2 ) 


if and only if 

MM) - M(M)) • p-HMM) - M(M)) + o, 

which is true for some i if and only if v and v' are in different communities. 


□ 


Vertex_classification_algorithm(v[],v’, r,r’,E,x,c): 

(Assumes that N r n \q\ E]('u[cr]) have already been computed for 0 < a < k and r" < r + g, that 
N r "\G\E]( v ') has already been computed for all r" < r', and that Zi(v[a ] ■ u[er]) as described 
in the previous algorithm have already been computed for each i and a) 

Solve the equations in the decomposition equation lemma for (v[a],v',r,r') in order to 
compute Zi(v[a] ■ v') for each a 

If there exists a unique a such that for all a' ^ a and all i, 

1Q 

Zi(v[a\ ■ u[ct]) - 2 Zi(v[a\ ■ v') < Zi(v[a'} ■ v[a']) - 2zi(v[a'} ■ v') + — • (2x(minp J ) _1/2 + x 2 ) 

o 

then conclude that v' is in the same community as v[a\. 

Otherwise, Fail. 


Lemma 8 . Assuming that E was generated properly, this algorithm runs in 0(((1 — c)Ai) r/ ) 
average time. Furthermore, assume that r, r', x, c, and the graph’s parameters satisfy the 
conditions of the decomposition equation lemma. Also, assume that u[] contains exactly one 
vertex from each community, v[a\ is (r + 77 , x)-good with respect to G\E for all a, and v' is 
(r' + 1 ,x)-good with respect to G\E. Finally, assume that lS^a^minpj ) -1 / 2 + x 2 ) is less 
than the minimum nonzero value of (^({t;}) — w^({V})) • P - 1 (u;j({u}) — Wj({V})). Then 
this algorithm classifies v' correctly with probability 1 — o( 1 ). 
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Proof. Again, the slowest step of the algorithm is using compute_N r+ j r i \e]{ v [c] ■ v') in 
order to calculate the constant terms for the equations. This runs in an average time of 
0((d + l)i?[|A^ r /[ G \£;](n / )|]) = 0(((1 — c)Ai) r ) and must be done kr] times. If the conditions 
given above are satisifed, then each Zi is within |^(2x’(minpj) -1 / 2 + x 2 ) of the product it 
seeks to approximate with probability 1 — o(l). If this is the case, then 

19 

Zi(v[a] ■ v[a]) - 2zi(v[a\ • v') < Zi(v[a'} ■ v[cr]) - 2zi(v[a'} ■ v') + — • (2x(min Pj )~ 1/2 + x 2 ) 
iff 


2wi({v'}) ■ P 1 iWi({v[cr]}) - tCi({n[cj]}) • P 1 iw i ({v[cr]}) 

> 2 Wi({v'}) ■ T >_1 ?Ui({n[(7 / ]}) - tUi({'u[o- / ]}) • P' 1 ^^; i ({^)[cr , ]}) 

This holds for all i and a' iff v' is in the same community as v[a], so the algorithm returns 
the correct result with probability 1 — o(l). □ 

At this point, we can finally start giving algorithms for classifying a graph’s vertices. 


Unreliable_graph_classification_algorithm(G,c,m,e,x): 

Randomly assign each edge in G to E independently with probability c. 

Randomly select m vertices in G , u[0],..., v[m — 1]. 

Let r = (1 — |) logn/log((l — c)Ai) — rj and r' = y • logn/log((l — c)Ai) 

Compute N r iii G \ E j (u[i]) for each r" < r + rj and 0 < i < m. 

Run vertex_comparison_algorithm(u[i], v\j\, r, r', E, x) for every i and j 

If these give results consistent with some community memberships which indicate that there 
is at least one vertex in each community in v [], randomly select one alleged member of each 
community v'[a\. Otherwise, fail. 

For every v" in the graph, compute 7V r »r G \ ^ (v") for each r" < r'. Then, run 
Vertex_classification_algorithm(n / [], v", r, r', E, x ) in order to get a hypothesized classification 
of v" 

Return the resulting classihcation. 


Lemma 9. For e < 1 this algorithm runs in 0{m?n 1 3 -\-n 1+ s e ) average time. Assume 


n 1+ 3^ 
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that all of the following hold: 


(1 - c)\* > 4Af 

X\k 

0 < x < - - : - 

A, ? mm pi 

(2(1 - c)A?/A 2)W3 < (1 - c)Al 

(1 + e/3) > log((l - c)Ai)/log((l - c)A 2 /Ai) 

13(2x(minpj) -1 / 2 + x 2 ) < min(wi({v}) - tc;({?/})) • P^u^dx}) - Wi({V})) 


x^(l — c)A^ min / a^(l —c)A^ minp^ (1 —c)A^ 

y = 2ke 16A 1 k 3 t 2 ((minp i )~ 1 / 2 +x) J f l _ g 16A 1 fe 3 / 2 ((minp i )- 1 / 2 +a:) 4A i I 

ITzf/i probability 1—o(l), G is such that Unreliable-graph-dassification-algorithm(G , c, m, e, x) 
/ias at least a 

1 — fc(l — min p,;)" 1 ' — my 

chance of classifying at least 1 — y of G’s vertices correctly. 


Proof. Generating E and x[] takes 0{n) time. Computing N r n\Q\E] (x[i]) for all r" < r + r] 
takes 0{m\ U r » -Ad'[G\E]d[*])) = O(mn) time, and computing N r n\Q\ E \(v') for all r" < r' 
and »'eG takes 


0(n\ U r "<r' lV r '/[G\E]) = 0(n • ((1 - c)Ai) r ') = 0(n 1+ 3 e ) 

time. Once these have been computed, running Vertex_comparison_algorithm(x[i], v\j\, r, rE, x ) 
for every i and j takes 0(m 2 • ((1 — c)Ai) r ) = 0(m 2 n l ~ 3 ) time, at which point an alleged mem¬ 
ber of each community can be found in 0(m 2 ) time. Running Vertex_classification_algorithm 
on (r/[], v", r, r', E, x) for every v" € G takes 0(n • ((1 — c)Ai) r/ ) = 0(n 1+3<; ) time. So, the 
overall algorithm runs in 0(m 2 n 1_3 + n 1+ 3 e ) average time. 

There exists y' < y such that if these conditions hold, then with probability 1 — o(l), at 
least 1 — y' of G’s vertices are (r + r/, x)-good and the number of vertices in G in community 
<7 is within y^nlogn of p a n for all a. If this is the case, then for sufficiently large n, it is at 
least 1 — jfc(l — minpj) m — my likely that every one of the m randomly selected vertices is 
(r + rj. x)-good and at least one is selected from each community. If that happens, then with 
probability 1 — o(l), vertex_comparison_algorithm(r;[i], v\j\, r, r', E, x) determines whether 
or not v[i] and v[j] are in the same community correctly for every i and j, allowing the 
algorithm to pick one member of each community. If that happens, then the algorithm will 
classify each (r' + r/, x)-good vertex correctly with probability 1 — o(l). So, as long as the 
initial selection of x[] is good, the algorithm classifies at least 1 — y of the graph’s vertices 
correctly with probability 1 — o(l). □ 


So, this algorithm can sometimes give a vertex classification that is nontrivially better 
than that obtained by guessing but it has an assymptotically nonzero failure rate. In order 
to get around that, we combine the results of multiple executions of the algorithm as follows. 
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Reliable_graph_classification_algorithm(G,c,m,e,x,T(n)) (i.e., Sphere-comparison): 

Run Unreliable_graph_classification_algorithm(G, c, m, e, x) T(n ) times and record the re¬ 
sulting classifications. 

Discard any classification that has greater than 


x 2 (l-c)A^ minpi /_ x 2 (1-c) A 2 min Pi (l-e)A^ 

4^g 16A 1 fe 3 / 2 ((min p i )- 1 / 2 +r ! :) I j 4 _ g I 6 A 1 fc 3 / 2 ((min Pi ) “ 1 / 2 +x) 4A 3 


disagreement with more than half of the other classifications. In this step, define the 
disagreement between two classifications as the minimum disagreement over all bijections 
between their communities. 

For every vertex in G, randomly pick one of the remaining classifications and assert that it is 
in the community claimed by that classification, where a community from one classification 
is assumed to correspond to the community it has the greatest overlap with in each other 
classification. 

Return the resulting combined classification. 


Lemma 10. For e < 1 this algorithm runs in 0{m 2 n l ~3T(n) + n 1+ 3 e T(n) + nT 2 (n)) 

x 2 (l-c)X 2 minpj /_ x 2 (l-c)A 2 min Pi (l-c)A^ \ 

average time. Let y = 2 ke i 6 A 1 t 3 / 2 ((mm Pi )- 1 / 2 +l) / [ 1 — g i 6 A 1 fe 3 / 2 ((min Pi )- 1 / 2 +x) 4 a 3 j , 


and assume that all of the following hold: 


(1 - c) A; > 4A? 


0 < x < 


A,, min pi 


(2(1-c)A?/A 2 ) 1 - £ / 3 <(1-c)A 1 

(1 + e/3) > log((l - c)Ai)/log((l - c)A 2 /Ai) 

k( 1 — min pi) m + my < - 
T{n)=u( 1) 

min pi > 6 y 

13(2x(min Pj )~ 1 ^ 2 +x 2 ) < min(u/j({u}) - ■ P~ 1 (wi({v}) - Wi({V})) 

Reliable-graph-classification-algorithm(G, c,m, e, x,T(n )) classifies as least 

1 — 2 y 


of G’s vertices correctly with probability 1 — o(l). 
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Proof. It takes 0(w?n l sT(n) + n 1+ PT(n)) time to run 

Unreliable_graph_classification_algoi'ithm(G, c, rn, e, x) 


T(n ) times. It takes 0(n) time to determine the best bijection between two classification’s 
communities and compute their disagreement. So, it takes 0{nT 2 (n )) time to compute all 
of the disagreements. Then, it takes O(n) time to combine them and output the result. 
Therefore, this algorithm takes 0(m 2 n 1_ 3T(n) +n 1+ 3 e T(n) + nT 2 (n )) average time. 

Assuming the conditions are met, G is such that Unreliable_graph_classification_algorithm 
on (G, c, m, e, x) has at least a 

1 — k( 1 — min Pi) m — my > — 

chance of giving a good classification each time it is run with probability 1 — o(l). Since 
T(n) = w(l), the majority of the classifications it generates will be good with probability 
1 — o(l). Each good classification has error at most y, so any classification with error greater 
than 3 y will have disagreement greater than 2 y with every good classification. On the flip 
side, no two good classifications can have disagreement greater than 2 y. So, if the majority 
of the classifications are good, none of the good classifications will be discarded, and any 
classification with error greater than 3 y will be discarded. The requirement that min p^ > 6 y 
ensures that the bijection between any two of the remaining classifications’ communities that 
minimizes their disagreement is the correct bijection. So, classifying each vertex according 
to one of the remaining bijections chosen at random has a misclassification rate less than 2 y. 
Therefore, with probability 1 — o(l), this algorithm classifies at least 1 — 2y of the vertices 
correctly, as desired. □ 


Proof of Theorem 4 ■ If the conditions hold, then for all sufficiently small c, 
Reliable_graph_classification_algorithm(G, c, ln(4fc)/minpj, e, x, logn) classifies at least 

(l-c)i 2 A^mm Pi _ (l-c)x 2 A 2 min (l-c)A^ 

\ _ 4/jg 16A 1 fe 3 / 2 ((minp i )- 1 / 2 +a;) — e / 2 / 2 +x) 4a| ) 

of G 's vertices corrrectly with probability 1 — o(l). Furthermore, it runs in 0(n 1+ 3 e logn) 
time. Thus, we can get the accuracy arbitrarily close to 


2 2 
x min p^ 


min p^ 


^ _ 4&6 I 6 A 4 fc 3 / 2 ((minp i ) 1 / 2 +a;) _ g 16A^fc 3 / 2 ((min p^) 1 / 2 +x) ^ 4A^ ^ ^ 


as desired. □ 

8 Exact recovery 

Recall that p is a probability vector of dimension k, Q is a k X k symmetric matrix with 
positive entries, and G 2 (n,p,Q) denotes the stochastic block model with community prior 
p and connectivity matrix In (n)Q/n. A random graph G drawn under G 2 (n,p,Q) has a 
planted community assignment, which we denote by a G [k] n and call sometime the true 
community assignment. 
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Recall also that exact recovery is solvable for a community partition [k] = U* =1 A S , if 
there exists an algorithm that assigns to each node in G an element of {A\,... ,At} that 
contains its true community 13 with probability 1 — o n (l). Exact recovery is solvable in 
SBM(n,p,Q ) if it is solvable for the partition of [k] into k singletons, i.e., all communities 
can be recovered. 

8.1 Formal results 

Definition 11. Let p,v be two positive measures on a discrete set X, i.e., two functions 
from X to IR+. We define the CH-divergence between p and u by 

D + (p, v) := max ^ (tp{x) + (1 - t)v(x) - p(xYi'{x) 1 ~ t ) . (27) 

Note that for a fixed t, 

M x ) + (! - *M®) - 

xeX 

is an /-divergence. For t = 1/2, i.e., the gap between the arithmetic and geometric means, 
we have 

t ^ x ) + (! “ t)v(x) - p(xY^{x) 1 ~ t = \\\yfp~ Vv\\l (28) 

x£X 

which is the Hellinger divergence (or distance), and the maximization over t of the part 
Y2, x pixY^ix) 1 * is the exponential of to the Chernoff divergence. We refer to Section 8.3 
for further discussions on D + . Note also that we will often evaluate D + as D + (x, y ) where 
x, y are vectors instead of measures. 

Definition 12. For the SBM G 2 (n,p,Q), where p has dimension k (i.e., there are k 
communities), the finest partition of [fc] is the partition of [&] in to the largest number of 
subsets such that D + ((PQ)i, ( PQ)j) > 1 for all i,j that are in different subsets. 

We next present our main theorem for exact recovery. We first provide necessary and 
sufficient conditions for exact recovery of partitions, and then provide an algorithm that 
solves exact recovery efficiently, more precisely, in quasi-linear time. 

Theorem 6. Let k £ Z_|_ denote the number of communities, p £ (0, l) fc with \p\ = 1 denote 
the community prior, P = diag(p), and let Q £ (0 1 oo) kxk symmetric with no two rows equal. 

• Exact recovery is solvable in the stochastic block model G 2 (n,p,Q) for a partition 
[k] = U 1 =1 T S if and only if for all i and j in different subsets of the partition, 

D+({PQ)i, (PQ)j) > 1, (29) 

where ( PQ)i denotes the i-th row of the matrix PQ. In particular, exact recovery is 
solvable in G 2 {n,p, Q ) if and only if miiij ,je[k],i^j D + ((PQ)j, ( PQ)j ) > 1. 


1J Up to a relabelling of the communities. 
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• For G ~ G 2 (n,p,Q), the algorithm Degree-prof iling(G,p, Q, 7 ) (see below u ) re¬ 
covers the finest partition with probability 1 — o n ( 1 ) and runs in o(n 1+e ) time for all 
e > 0 . 

Note that second item in the theorem implies that Degree-profiling solves exact 
recovery efficiently whenever the parameters p and Q allow for exact recovery to be solvable. 
In addition, it gives an operational meaning to a new divergence function, analog to 
operational meaning given to the KL-divergence in the channel coding theorem (see Section 
2.3). 


Remark 1. f Qij = 0 for some i and j then the results above still hold, except that if for 
all i and j in different subsets of the partition, 

D+{{PQ)i,{PQ)j)>l, (30) 

but there exist i and j in different subsets of the partition such that D + ((PQ)i , ( PQ)j) = 1 
and (( PQ)i : k • ( PQ)j,k ■ (( PQ)i,k ~ ( PQ)j,k ) = 0 for all k, then the optimal algorithm will have 
an assymptotically constant failure rate. The recovery algorithm also needs to be modified to 
accomodate 0 's in Q. 

Remark 2. As shown in the proof of Theorem 6, when exact recovery is not solvable, any al¬ 
gorithm must confuse at least one vertex with probability 1—o n (l), and not just with probability 
away from 0. Hence exact recovery has a sharp threshold at min i,je[k],i^j D+((PQ)i\\(PQ)j) = 

1. 

Example 3. For the symmetric block model where p is equiprobable on [A;] and Q takes 
only two different values, a on the diagonal and /3 outside the diagonal (with a, (3 > 0), the 
requirement in Theorem 6 for recovery of any (or all) communities is equivalent to 

\\fot - yfP\ > Vk, (31) 

which generalizes the result obtained in [ABHlj, MNSb] for k = 2. 

The algorithm Degree-profiling is given in Section 3.1 and replicated below. The idea 
is to recover the communities with a two-step procedure, similarly to one of the algorithms 
used in [ABH14] for the two-community case. In the first step, we run Sphere-comparison 
on a sparsified version of G 2 (n,p, Q ) which has a slowly growing average degree. Hence, from 
Corollary 5, Sphere-comparison recovers correctly a fraction of nodes that is arbitrarily 
close to 1 (w.h.p.). In the second step, we proceed to an improvement of the first step 
classification by making local checks for each node in the residue graph and deciding whether 
the node should be moved to another community or not. This step requires solving a 
hypothesis testing problem for deciding the local degree profile of vertices in the SBM. The 
CH-divergence appears when resolving this problem, as the mis-classification error exponent. 
We present this result of self-interest in Section 8.2. The proof of Theorem 6 is given in 

14 7 = 7 {n,P, Q) is set to " , where A = min r ,» e[ t] min ie A r ,j6A 0 D+((pQ)i, ( pQ)j ) and Ai,... ,A t 

is the finest partition of [fc]. 
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Section 8.3. 


Degree-profiling algorithm. 

Inputs: a graph g = ([n],E), the SBM parameters pi, i G [&], Qij, i,j G [k], and a splitting 
parameter 7 G [0,1] (see Theorem 6 for the choice of 7). 

Output: Each node v G [n] is assigned a community-list A(v) G {Hi,..., At}, where 
Ai,...,A t is the partition of [k] in to the largest number of subsets such that D + ((pQ)i, (jpQ)j) > 
1 for all i. j in [k] that are in different subsets. 

Algorithm: 

(1) Define the graph g' on the vertex set [n] by selecting each edge in g independently with 
probability 7, and define the graph g" that contains the edges in g that are not in g'. 

(2) Run Sphere-comparison on g' to obtain the preliminary classification a' G [k] n (see 
Section 7 and Corollary 5.) 

(3) Determine the edge density between each pair of alleged communities, and use this 
information and the alleged communities’ sizes to attempt to identify the communities up 
to symmetry. 

(4) For each node v G [n], determine in which community node v is most likely to belong to 
based on its degree profile computed from the preliminary classification a' (see Section 8.2), 
and call it <r" 

(5) For each node v G [n], determine in which group A\,.. ., At node v is most likely to 
belong to based on its degree profile computed from the preliminary classification a" (see 
Section 8.2). 

8.2 Testing degree profiles 

In this section, we consider the problem of deciding which community a node in the SBM 
belongs to based on its degree profile. We first make the latter terminology precise. 

Definition 13. The degree profile of a node v G [n] for a partition of the graph’s vertices 
into k communities is the vector d(v ) G iff, where the j-th component dj(v ) counts the 
number of edges between v and the vertices in community j. Note that d{v) is equal to N\(v) 
as defined in Definition f. 

For G ~ G 2 (n,p, Q), community i G [k] has a relative size that concentrates exponentially 
fast to pi. Hence, for a node v in community j , d(y) is approximately given by X^effcl 
where Xij are independent and distributed as Bin(npj, ln(n)Qij/n), and where Bin (a, 6) 
denotes 15 the binomial distribution with a trials and success probability b. Moreover, the 
Binomial is well-enough approximated by a Poisson distribution of the same mean in this 
regime. In particular, Le Cam’s inequality gives 



(32) 


15 


Bin(a, b ) refers to Bin([_aJ, b) if a is not an integer. 
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hence, by the additivity of Poisson distribution and the triangular inequality, 

\\Vd(v) ~ V(\n(n) ^2 PiQijei)\\ T v = O f ■ ( 33 ) 

ie[k] \ n J 

We will rely on a simple one-sided bound (see (71)) to approximate our events under the 
Poisson measure. 

Consider now the following problem. Let G be drawn under the G 2 (n,p,Q) SBM and 
assume that the planted partition is revealed except for a given vertex. Based on the degree 
profile of that vertex, is it possible to classify the vertex correctly with high probability? 
We have to resolve a hypothesis testing problem, which involves multivariate Poisson distri¬ 
butions in view of the previous observations. We next study this problem. 


Testing multivariate Poisson distributions. Consider the following Bayesian hy¬ 
pothesis testing problem with k hypotheses. The random variable H takes values in [fc] with 
P {H = j} = Pj (this is the a priori distribution of H). Under H = j, an observed random 
variable D is drawn from a multivariate Poisson distribution with mean A (j) G R(?, i.e., 

P{D = d\H = j}=V m (d), de Z*, (34) 


where 


and 


iG[k] 




dil 


(35) 


(36) 


In other words, D has independent Poisson entries with different means. We use the following 
notation to summarize the above setting: 


D\H = j ~P(A(j)), je[k}. (37) 

Our goal is to infer the value of H by observing a realization of D. To minimize the error 
probability given a realization of D, we must pick the most likely hypothesis conditioned on 
this realization, i.e., 


argmax je[fc ]P{L> = d\H = j}pj , (38) 

which is the Maximum A Posteriori (MAP) decoding rule. 16 To resolve this maximization, 
we can proceed to a tournament of k — 1 pairwise comparisons of the hypotheses. Each 
comparison allows us to eliminate one candidate for the maxima, i.e., 

P {D = d\H = i} Pi > P {D = d\H = j} Pj => H / j. (39) 


1(, Ties can be broken arbitrarily. 


40 





The error probability P e of this decoding rule is then given by, 

Pe=J2 P i De Vad(i)\ H = i} Pi , (40) 

ie[fe] 

where Bad(z) is the region in Z+ where i is not maximizing (38). Moreover, for any i £ [k\, 
P {D € Bad(i)|iZ = i} < Y P {D E Badj(i)\H = i} (41) 

where Badj(i) is the region in Z+ where P {D = x\H = i}pi < P {D = x\H = j}pj. Note 
that with this upper-bound, we are counting the overlap regions where P {D = x\H = i}pt < 
P {D = x\H = j}pj for different j’s multiple times, but no more than k — 1 times. Hence, 

Y P{D E Badj(i)\H = i} < (k - 1)P {D G Bad(i)\H = i}. (42) 

Putting (40) and (41) together, we have 

Pe<Y P { D G Bad iW \ H = (43) 

= YY1 min ( p {-° = d \ H = p i D = d\ H = j}Pj) (44) 

i<j d£ l k + 

and from (42), 

Pe > ^ XI E min(P {D = d\H = i} Pi , P {D = d\H = j} Pj ). (45) 

i<j <zez* 


Therefore the error probability P e can be controlled by estimating the terms min(P{D = 

d\H = i}pi, P {D = d\H = j}pj)- In our case, recall that 


P{D = d\H = i} = V x{i) (d), 


(46) 


which is a multivariate Poisson distribution. In particular, we are interested in the regime 
where k is constant and A(z) = ln(n)cj, c* E P+, and n diverges. Due to (44), (45), we 
can then control the error probability by controlling J2 x ei k m i n (T’i n ( ri ) c , i ( x )Pi;P]n(n)c :j ( x )Pj)i 
which we will want to be o(l/n) to classify vertices in the SBM correctly with high probability 
based on their degree profiles (see next section). The following lemma provides the relevant 
estimates. 


Lemma 11. For any ci,C 2 E (P+ \ {0}) fc with ci ^ C 2 and pi,p 2 E P+ \ {0}, 
Y m ^(Pln(n)c 1 ( x )Pl,Pln(n)c 2 (x)p2) = O [ U 


~D + {cy,C 2 )-^M 


_—_ / 1 fclnln(n)\ 

Y m in('Pin(jj)ci (®)Pl) P\n{n)C2 { x )pY) = fM 7l _ + Cl ’ C2 “ 21n ^ ) , 


(47) 

(48) 


where D + (c\,C 2 ) is the CH-divergence as defined in (27). 
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Iii other words, the CH-divergence provides the error exponent for deciding among 
multivariate Poisson distributions. We did not find this result in the literature, but found 
a similar result obtained by Verdii [Ver 86 ], who shows that the Hellinger distance (the 
special case with t = 1/2 instead of the maximization over t ) appears in the error exponent 
for testing Poisson point-processes, although [Ver 86 ] does not investigate the exact error 
exponent. 

Proof of Lemma 11. Assume without loss of generality that ci i 7 ^ c 2 .i. To prove the first 
half of the lemma, note that for any t E [0,1], 

^2 m ^('Pin(n)c 1 (x)pi,Vi n ^ C2 (x)p 2 ) (49) 

x£Z k + 

<max(pi,p 2 ) { x ) i'P\n(n)c 2 ( x )) ( 50 ) 

= max(pi,p 2 ) y^min(e~ Ilin ^-- Cl - i JJ(lnn • c\^) Xi /xi\ : e~ lnn ^ C2 ’ i JJ(lnn • c 2 ^) Xi /xf.) 

^ (51) 

= max(pi,p 2 )e- lnn S^i,i+(i-t)c 2 ,i ^ (inn • / Xi \ (52) 

• min ( e - 1 nn( 1 -t)Eci, i -P 2 , i \\{c 2 ^/ Cl)i ) tx ^ 

(53) 

For any choice of x 2 ,...,Xk, there must exist x\ (not necessarily an integer) such that 
e —inn(i—t) 2 c i,i— c 2 ,t jj( Cl ^(i—t)x» _ 1 As a result, the expression above must be less 

than or equal to 


max(pi,p 2 )e- lnn Ste 1 , i+a-t)c2,i ^ (inn • /x<! 

• min ((ci.i/c,,!) 1 -', (c 2 ,i/ci,i)*) |xi " Innctl ' l4 ' lVl 
= O /Vhnnj 

When t is chosen to maximize ,i + (1 — t)c 2 .i — c\ this is 


(54) 

(55) 

(56) 


To prove the second half, let t maximize ,i + (1 — t)c 2 ,i ~ c\ ^ 4. Hence 


- c 2:i - ln(ci ,i/c 2t i)c\Y 2 / = 0. 


This implies that 


3 —Inn2 ci,* 


]T[(lnn • ci,j) 


In nc\ - 


4,1* = 


Q(hin • C 2 ,i, 


In nc\ 


1 -t 
2,4 
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As a result 


min(Pi n („) cl (c\ ti cl/ In n)pi, Pi n (n) C 2 c M c 2/ In n)p 2 ) 

> min(pi,p 2 )e~ lnn ^ Cl,i JJ(lnn • lnn /(4 jC2~ < Inn)! 

= min(pi,_p 2 )e~ lnn ^' C2 ’ i JJ(lnn • c 2 ji ) Cl 'i c 2,i lnn /( c i,i c 2^ lnn )* 

= min(p 1 ,p 2 )e- lnn Stc 1 , i +( 1 -t)c 2 , i JJ( lnn . Inn)! 

= ^(min(pi,p 2 )e- InnEtel ^ +{1 - t)c2 ’ i - c l» c ^V(lnn) fc/2 ) 


= ( n 




(57) 

(58) 

(59) 

(60) 
(61) 
(62) 


Thus, 


E min {^Hn)cAx)pi,Vl n{n ) C2 (x)p2) =17 
xez'i 


n 


-D + (cuC2)- k -^ 


(63) 

□ 


This lemma together with previous bounds on P e imply that if D + (ci,Cj) > 1 for all 
i / j, the true hypothesis is correctly recovered with probability o(l/n). However, it may be 
that D + (ci,Cj ) > 1 only for a subset of (i, j (-pairs. What can we then infer? While we may 
not recover the true value of H with probability o(l/n), we may narrow down the search 
within a subset of possible hypotheses with that probability of error. 


Testing composite multivariate Poisson distributions. We now consider the 
previous setting, but we are no longer interested in determining the true hypothesis, but 
in deciding between two (or more) disjoint subsets of hypotheses. Under hypothesis 1, the 
distribution of D belongs to a set of possible distributions, namely V(X *) where i G A, and 
under hypothesis 2, the distribution of D belongs to another set of distributions, namely 
V(Xi) where i 6 B. Note that A and B are disjoint subsets such that AU B = [k\. In short, 

D\H = 1 ~ V(Xi), for some i G A, (64) 

D\H = 2 ~ V(Xi), for some i £ B, (65) 

and as before the prior on A* is p^. To minimize the probability of deciding the wrong 
hypothesis upon observing a realization of D, we must pick the hypothesis which leads to 
the larger probability between P {H G A\D = d} and P {H G B\D = d }, or equivalently, 

^2'P\(i){d)Pi > ^Vxii^pi => H = 1, (66) 

ieA ieB 

i)(d)pi =>- H = 2. (67) 

ieA i&B 
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In other words, the problem is similar to previous one, using the above mixture distributions. 
If we denote by P e the probability of making an error with this test, we have 

Pe = ^ min i^2'P\(i)(x)p u ^2'P X (i)(x)pi\ . ( 68 ) 

xeZ* \i&A i&B J 

Moreover, applying bounds on the minima of two sums, 

ftsE.E min (V x{ i) ( x)pi , V x (j) (x)pj) , (69) 

ftj^E E min (Vx(i){x)pi,Vx(j)-{x)pj) . (70) 

1 11 ' x &I k + i&A,j&B 

Therefore, for constant k and A (i) = ln(rt)cj, q G IR(j_, with n diverging, it suffices to control 
the decay of Y2 x ez k m ^ n (P\(i)( x )Pi^P\(j)( x )Pj) when i 6 A and j G B , in order to bound 
the error probability of deciding whether a vertex degree profile belongs to a group of 
communities or not. 

The same reasoning can be applied to the problem of deciding whether a given node 
belongs to a group of communities, with more than two groups. Also, for any p and p' such 
that | pj — p'j\ < In n/yfn for each j , Q, 7 (n), and i, 

max f Bin, 

np i ; (1-7W) Mgl Q.\ (x) - 2'PpQ i (i_ 7 ( n )) ln(n ) /n (x), Oj — 0 (l/n 2 ). (71) 

xez^ \ \ ’ n y / 


So, the error rate for any algorithm that classifies vertices based on their degree profile in a 
graph drawn from a sparse SBM is at most 0(l/n 2 ) more than twice what it would be if 
the probability distribution of degree profiles really was the poisson distribution. 

In summary, we have proved the following. 

Lemma 12. Let k G Z + and let A\,... ,At be disjoint subsets of [k] such that u( =j A t = [k]. 
Let G be a random graph drawn under G 2 (n,p, (1 — 7 (n))Q). Assigning the most likely 
community subset Ai to a node v based on its degree profile d{v) gives the correct assignment 
with probability 

l — o ^77, 1 1 tC 77 -)) a b ln (( 1— 7 ( n )) l n n)/\nn _|_ 

where 



A= min min D + {{pQ) u {pQ)j). (72) 

r,se[t] i£.A r ,jeA s 

Moreover, we will need the following “robust” version of this lemma to prove Theorem 6. 

Lemma 13. Let k G Z + and let A±,... ,At be disjoint subsets of [k] such that u\ =l Ai = [Pi- 
Let G be a random graph drawn under G 2 (n,p, (1 — 7 (n))Q). There exist c\, c-i, and C 3 
such that for any 6, assigning the most likely community subset Ai to a node v based on a 


44 




distortion of its degree profile that independently gets each node’s community wrong with 
probability at most 5 gives the correct assignment with probability at least 

l — c 2 ■ (1 + ci<5) C3lnn • (Vi~( 1_7 ( n )) A_ ^ ln (( 1 ~ 7 ( n )) lnn )/ lnn ) > )_—, 

V / n 2 ’ 

where 


A = 


min min D + ({pQ) i ,(pQ) j ). 

r>s€[£] t£A r ,j£A s 
r^s 


(73) 


Proof. Let 

ci = maxY "'pi'qi'j/ipmj). 

*— 4 

The key observation is that v’s mth neighbor had at least a min ijipiQij)/'YhPi'Qi',j chance 
of actually being in community a for each o, so its probability of being reported as being in 
community a is at most 1 + ci<5 times the probability that it actually is. So, the probability 
that its reported degree profile is bad is at most (1 + ci^)^ 1 ^ times the probability that 
its actual degree profile is bad. Choose C 3 such that each vertex in the graph has degree less 
than C 3 In n with probability 1 — \ and the conclusion follows from this and the previous 
bounds on the probability that classifying a vertex based on its degree profile fails. □ 

8.3 Proof of Theorem 6 

We break the proof into two parts, the possibility and impossibility parts. 

Claim 1 (achievability) . Let G ~ G 2 (n,p,Q) and 7 = + 777 ^, where 

A= min min D + ((pQ)i,(pQ)j) 

r,se[tj i^-A r ,jeA s 
r^s 

and Ai,.... At is a partition of [k]. Degree-prof iling(G,p, Q, 7 ) recovers the partition 
[k] = l_ 4 = 1 T s with probability 1 — o n (l) if for all i , j in [fc] that are in different subsets, 

D + ((PQ)i,(PQ)j)> 1. (74) 

The idea behind Claim 1 is contained in Lemma 12. However, there are several technical 
steps that need to be handled: 

1. The graphs G' and G" obtained in step 1 of the algorithm are correlated, since an 
edge cannot be both in G' and G" . However, this effect can be discarded since two 
independent versions would share edges with low enough probability. 

2. The classification in step 2 using Sphere-comparison has a vanishing fraction of 
vertices which are wrongly labeled. This requires using the robust version of Lemma 
12, namely Lemma 13. 

3. In the case where D + ((PQ)i, ( PQ)j ) = 1 a more careful classification is needed as 
carried in steps 3 and 4 of the algorithm. 
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Proof. With probability 1 — 0(l/n), no vertex in the graph has degree greater than C 3 Inn. 
Assuming that this holds, no vertex’s set of neighbors in G" is more than 

(1 — maxqij Inn/n) _C3lnn • (n/(n — C 3 lnn)) C3lnn = 1 + o(l) 

times as likely to occur as it would be if G" were independent of G'. So, the fact that 
they are not has negligible impact on the algorithm’s error rate. As n goes to infinity, c/’s 
expected error rate goes to 0 , so the expected error in the algorithm’s estimates of the 
connectivity rates between its communities also goes to 0. Thus, the algorithm labels the 
communities correctly (up to equivalent relabeling) with probability 1 — o(l). Now, let 

<5 = ( e ^i f 1 D + (( p Q)i,( p Q)j) _ i)/ Cl . 

By Lemma 13, if the classification in step 2 has an error rate of at most 5 , then the 
classification in step 3 has an error rate of 

0( n -( 1 ~'y) min i& D + ((PQ)i,(PQ)j)/2 i/ n 2 ) ; 

observing that if a' v 7 ^ a v the error rate of a”, for v' adjacent to v is at worst multiplied by a 
constant. That in turn ensures that the final classification has an error rate of at most 

o(( 1 + o(n - (1 - 7)min ^ D+((PQ)! ’ (PC3) ^ /2 + l/n 2 )) C3Inn — lnn -1 / 4 ^ = o(- In n~ 1/4 

\ n J \n 

□ 

Claim 2 (converse). Let G ~ G 2 (n,p, Q ) and A \,..., At a partition of [A;]. If there exist 
r, s G [t\, s / t, i E A r , and j G A s such that 

D + {{PQ) i ,{PQ) j )< 1, (75) 

then every algorithm classifying the vertices of G into elements A\,..., At must mis-classify 
at least one vertex with probability 1 — o n (l). 

Proof. With probability 1 — o(l), every community of G has a size that is within a factor of 
1 + In n/\fn of its expected size. Assume that this holds. Let S' be a random set of n/ In 3 (n) 
of G’s vertices. With probability 1 — o(l) the number of vertices in S in community l is 
within y/n of p^n/log 5 n for each i, and a randomly selected vertex in S is adjacent to 
another vertex in S with probability o(l). Next, choose i and j such that i G A r , j G A s , 
r ^ s, and D + ((PQ)i, (PQ)f) < 1. Now, let 

x e =l(PQ)l i (PQ)lJlnn\ (76) 

for each l G [k\, where t G [0,1] is chosen to maximize 

£ t(PQ)t,i + (1 - t)(PQ)e,j - (PQ)U p Q)tJ- ( 77 ) 

Roughly speaking, we want to show that every vertex in community i or j has a degree 
profile of x with probability 

n(n- D+ « PQ)i ’( PQ) i ) / ln k/2 (n)) (78) 
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and that there is no way to determine which community the vertices with this degree profile 
are in. 

More precisely, call a vertex in S ambiguous if it has exactly xg neighbors in community t 
that are not in S for each i. The probability distribution of a vertex’s numbers of neighbors 
in G\S in each community is essentially a multivariable poisson distribution. So, by the 
assumption that D + ((PQ)i, ( PQ)j ) < 1 and the argument in Lemma 11, there exists e > 0 
such that a vertex in S that is in either community i or community j is ambiguous with 
probability H(n e_1 ). Furthermore, for a fixed community assignment and choice of S, there is 
no dependence between whether or not any two vertices are ambiguous. Also, an ambiguous 
vertex is not adjacent to any other vertex in S with probability 1 — o(l). So, with probability 
1 — o(l) there are at least In n ambigous vertices in community i that are not adjacent to 
any other vertices in S and Inn ambiguous vertices in community j that are not adjacent 
to any other vertices in S. These vertices are indistinguishable, so no algorithm classifies 
them all correctly with probability greater than 1 / ( 2 |j'' r ”) ■ Therefore, every algorithm must 
misclassify at least one vertex with probability 1 — o(l). □ 
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